PineconeIndexDemo Hybrid

Pinecone矢量存储 - 混合搜索

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-vector-stores-pinecone
%pip install llama-index-vector-stores-pinecone

In [ ]:

Copied!

!pip install llama-index>=0.9.31 pinecone-client>=3.0.0 "transformers[torch]"
!pip install llama-index>=0.9.31 pinecone-client>=3.0.0 "transformers[torch]"

创建一个Pinecone索引¶

In [ ]:

Copied!

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [ ]:

Copied!

from pinecone import Pinecone, ServerlessSpec
from pinecone import Pinecone, ServerlessSpec

In [ ]:

Copied!

import osos.environ[    "PINECONE_API_KEY"] = #"<您的Pinecone API密钥，来自app.pinecone.io>"os.environ[    "OPENAI_API_KEY"] = "sk-..."api_key = os.environ["PINECONE_API_KEY"]pc = Pinecone(api_key=api_key)
import osos.environ[    "PINECONE_API_KEY"] = #"<您的Pinecone API密钥，来自app.pinecone.io>"os.environ[    "OPENAI_API_KEY"] = "sk-..."api_key = os.environ["PINECONE_API_KEY"]pc = Pinecone(api_key=api_key)

In [ ]:

Copied!

# 如果需要的话删除# pc.delete_index("quickstart")
# 如果需要的话删除# pc.delete_index("quickstart")

In [ ]:

Copied!

# dimensions are for text-embedding-ada-002# 注意：需要使用点积进行混合搜索pc.create_index(    name="quickstart",    dimension=1536,    metric="dotproduct",    spec=ServerlessSpec(cloud="aws", region="us-west-2"),)# 如果需要创建基于Pod的Pinecone索引，也可以这样做：## from pinecone import Pinecone, PodSpec## pc = Pinecone(api_key='xxx')## pc.create_index(# 	 name='my-index',# 	 dimension=1536,# 	 metric='cosine',# 	 spec=PodSpec(# 		 environment='us-east1-gcp',# 		 pod_type='p1.x1',# 		 pods=1# 	 )# )#
# dimensions are for text-embedding-ada-002# 注意：需要使用点积进行混合搜索pc.create_index(    name="quickstart",    dimension=1536,    metric="dotproduct",    spec=ServerlessSpec(cloud="aws", region="us-west-2"),)# 如果需要创建基于Pod的Pinecone索引，也可以这样做：## from pinecone import Pinecone, PodSpec## pc = Pinecone(api_key='xxx')## pc.create_index(# 	 name='my-index',# 	 dimension=1536,# 	 metric='cosine',# 	 spec=PodSpec(# 		 environment='us-east1-gcp',# 		 pod_type='p1.x1',# 		 pods=1# 	 )# )#

In [ ]:

Copied!

pinecone_index = pc.Index("quickstart")
pinecone_index = pc.Index("quickstart")

下载数据

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档，构建PineconeVectorStore¶

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.pinecone import PineconeVectorStore
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.pinecone import PineconeVectorStore
from IPython.display import Markdown, display

In [ ]:

Copied!

# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

In [ ]:

Copied!

# 将add_sparse_vector设置为True以在upsert期间计算稀疏向量from llama_index.core import StorageContextif "OPENAI_API_KEY" not in os.environ:    raise EnvironmentError(f"Environment variable OPENAI_API_KEY is not set")vector_store = PineconeVectorStore(    pinecone_index=pinecone_index,    add_sparse_vector=True,)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context)
# 将add_sparse_vector设置为True以在upsert期间计算稀疏向量from llama_index.core import StorageContextif "OPENAI_API_KEY" not in os.environ:    raise EnvironmentError(f"Environment variable OPENAI_API_KEY is not set")vector_store = PineconeVectorStore(    pinecone_index=pinecone_index,    add_sparse_vector=True,)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"

Upserted vectors:   0%|          | 0/22 [00:00<?, ?it/s]

查询索引¶

可能需要等待一两分钟，直到索引准备就绪。

In [ ]:

Copied!

# 将日志级别设置为DEBUG，以获得更详细的输出query_engine = index.as_query_engine(vector_store_query_mode="hybrid")response = query_engine.query("What happened at Viaweb?")
# 将日志级别设置为DEBUG，以获得更详细的输出query_engine = index.as_query_engine(vector_store_query_mode="hybrid")response = query_engine.query("What happened at Viaweb?")

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

In [ ]:

Copied!

display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))

At Viaweb, Lisp was used as a programming language. The speaker gave a talk at a Lisp conference about how Lisp was used at Viaweb, and afterward, the talk gained a lot of attention when it was posted online. This led to a realization that publishing essays online could reach a wider audience than traditional print media. The speaker also wrote a collection of essays, which was later published as a book called "Hackers & Painters."