Qdrant 矢量存储库¶
创建一个Qdrant客户端¶
In [ ]:
Copied!
%pip install llama-index-vector-stores-qdrant llama-index-readers-file llama-index-embeddings-fastembed llama-index-llms-openai
%pip install llama-index-vector-stores-qdrant llama-index-readers-file llama-index-embeddings-fastembed llama-index-llms-openai
In [ ]:
Copied!
import logging
import sys
import os
import qdrant_client
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
import logging
import sys
import os
import qdrant_client
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
如果是第一次运行,请使用以下命令安装依赖项:
!pip install -U qdrant_client fastembed
设置您的OpenAI密钥以进行LLM的身份验证
请按照以下步骤将OpenAI API密钥设置为OPENAI_API_KEY环境变量 -
- 使用终端
In [ ]:
Copied!
export OPENAI_API_KEY=your_api_key_here
export OPENAI_API_KEY=your_api_key_here
- 在Jupyter Notebook中使用IPython魔术命令
In [ ]:
Copied!
%env OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
%env OPENAI_API_KEY=
- 使用Python脚本
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "your_api_key_here"
import os
os.environ["OPENAI_API_KEY"] = "your_api_key_here"
注意:通常建议将诸如API密钥之类的敏感信息设置为环境变量,而不是硬编码到脚本中。
In [ ]:
Copied!
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
下载数据
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档¶
In [ ]:
Copied!
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
构建VectorStoreIndex¶
In [ ]:
Copied!
client = qdrant_client.QdrantClient(
# 您可以使用 :memory: 模式进行快速轻量级实验,
# 它不需要在任何地方部署 Qdrant
# 但需要 qdrant-client >= 1.1.1
# location=":memory:"
# 否则,请使用以下方式设置 Qdrant 实例地址:
# url="http://<host>:<port>"
# 否则,请使用主机和端口设置 Qdrant 实例:
host="localhost",
port=6333
# 为 Qdrant Cloud 设置 API KEY
# api_key="<qdrant-api-key>",
)
client = qdrant_client.QdrantClient(
# 您可以使用 :memory: 模式进行快速轻量级实验,
# 它不需要在任何地方部署 Qdrant
# 但需要 qdrant-client >= 1.1.1
# location=":memory:"
# 否则,请使用以下方式设置 Qdrant 实例地址:
# url="http://:"
# 否则,请使用主机和端口设置 Qdrant 实例:
host="localhost",
port=6333
# 为 Qdrant Cloud 设置 API KEY
# api_key="",
)
In [ ]:
Copied!
vector_store = QdrantVectorStore(client=client, collection_name="paul_graham")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
)
vector_store = QdrantVectorStore(client=client, collection_name="paul_graham")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
)
查询索引¶
In [ ]:
Copied!
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query("作者在成长过程中做了什么?")
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query("作者在成长过程中做了什么?")
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author worked on writing and programming before college.
In [ ]:
Copied!
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query(
"作者在Viaweb工作结束后做了什么?"
)
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query(
"作者在Viaweb工作结束后做了什么?"
)
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author arranged to do freelance work for a group that did projects for customers after his time at Viaweb.
异步构建VectorStoreIndex¶
In [ ]:
Copied!
# 连接到相同的事件循环,
# 允许异步事件在笔记本上运行
import nest_asyncio
nest_asyncio.apply()
# 连接到相同的事件循环,
# 允许异步事件在笔记本上运行
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
aclient = qdrant_client.AsyncQdrantClient(
# 你可以使用 :memory: 模式进行快速和轻量级的实验,
# 它不需要在任何地方部署 Qdrant
# 但需要 qdrant-client >= 1.1.1
location=":memory:"
# 否则,使用以下方式设置 Qdrant 实例地址:
# uri="http://<host>:<port>"
# 为 Qdrant Cloud 设置 API KEY
# api_key="<qdrant-api-key>",
)
aclient = qdrant_client.AsyncQdrantClient(
# 你可以使用 :memory: 模式进行快速和轻量级的实验,
# 它不需要在任何地方部署 Qdrant
# 但需要 qdrant-client >= 1.1.1
location=":memory:"
# 否则,使用以下方式设置 Qdrant 实例地址:
# uri="http://:"
# 为 Qdrant Cloud 设置 API KEY
# api_key="",
)
In [ ]:
Copied!
vector_store = QdrantVectorStore(
collection_name="paul_graham",
client=client,
aclient=aclient,
prefer_grpc=True,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
use_async=True,
)
vector_store = QdrantVectorStore(
collection_name="paul_graham",
client=client,
aclient=aclient,
prefer_grpc=True,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
use_async=True,
)
异步查询索引¶
In [ ]:
Copied!
query_engine = index.as_query_engine(use_async=True)
response = await query_engine.aquery("What did the author do growing up?")
query_engine = index.as_query_engine(use_async=True)
response = await query_engine.aquery("What did the author do growing up?")
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author worked on writing short stories and programming, particularly on an IBM 1401 computer in 9th grade using an early version of Fortran. Later, the author transitioned to working on microcomputers, starting with a TRS-80 in about 1980, where they wrote simple games, programs, and a word processor.
In [ ]:
Copied!
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine(use_async=True)
response = await query_engine.aquery(
"作者在Viaweb工作结束后做了什么?"
)
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine(use_async=True)
response = await query_engine.aquery(
"作者在Viaweb工作结束后做了什么?"
)
In [ ]:
Copied!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
The author went on to co-found Y Combinator after his time at Viaweb.