Milvus向量存储¶
在这个笔记本中,我们将展示如何快速使用MilvusVectorStore的演示。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-vector-stores-milvus
%pip install llama-index-vector-stores-milvus
In [ ]:
Copied!
%pip install llama-index
%pip install llama-index
In [ ]:
Copied!
import loggingimport sys# 取消注释以查看调试日志# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Documentfrom llama_index.vector_stores.milvus import MilvusVectorStoreimport textwrap
import loggingimport sys# 取消注释以查看调试日志# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Documentfrom llama_index.vector_stores.milvus import MilvusVectorStoreimport textwrap
设置OpenAI¶
首先,让我们添加OpenAI的API密钥。这将允许我们访问OpenAI以获取嵌入和使用ChatGPT。
In [ ]:
Copied!
import openai
openai.api_key = "sk-***********"
import openai
openai.api_key = "sk-***********"
下载数据
In [ ]:
Copied!
! mkdir -p 'data/paul_graham/'
! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
! mkdir -p 'data/paul_graham/'
! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
生成我们的数据¶
有了我们的LLM集合,让我们开始使用Milvus索引。作为第一个示例,让我们从data/paul_graham/
文件夹中找到的文件生成一个文档。在这个文件夹中,有一篇来自Paul Graham的单篇文章,标题为What I Worked On
。为了生成这些文档,我们将使用SimpleDirectoryReader。
In [ ]:
Copied!
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print("文档ID:", documents[0].doc_id)
# 加载文档documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print("文档ID:", documents[0].doc_id)
Document ID: 11c3a6fe-799e-4e40-8122-2339936c2722
在数据中创建索引¶
现在我们有了一个文档,我们可以创建一个索引并插入文档。对于索引,我们将使用一个GPTMilvusIndex。GPTMilvusIndex接受一些参数:
uri (str, optional)
: 连接的URI,如果使用Milvus或Zilliz Cloud服务,格式为"https://address:port",如果使用本地的lite Milvus,则为"path/to/local/milvus.db"。默认为"./milvus_llamaindex.db"。token (str, optional)
: 登录的令牌。如果不使用rbac,则为空,如果使用rbac,则可能是"username:password"。默认为""。collection_name (str, optional)
: 数据将存储的集合的名称。默认为"llamalection"。dim (int, optional)
: 嵌入的维度。如果未提供,将在第一次插入时创建集合。默认为None。embedding_field (str, optional)
: 集合的嵌入字段的名称,默认为DEFAULT_EMBEDDING_KEY。doc_id_field (str, optional)
: 集合的doc_id字段的名称,默认为DEFAULT_DOC_ID_KEY。similarity_metric (str, optional)
: 要使用的相似度度量,目前支持IP和L2。默认为"IP"。consistency_level (str, optional)
: 为新创建的集合使用的一致性级别。默认为"Strong"。overwrite (bool, optional)
: 是否覆盖同名的现有集合。默认为False。text_key (str, optional)
: 在传递的集合中存储文本的键。在使用自己的集合时使用。默认为None。index_config (dict, optional)
: 用于构建Milvus索引的配置。默认为None。search_config (dict, optional)
: 用于搜索Milvus索引的配置。请注意,这必须与index_config指定的索引类型兼容。默认为None。
请注意,Milvus Lite 需要
pymilvus>=2.4.2
。
In [ ]:
Copied!
# 在文档上创建索引from llama_index.core import StorageContext# 创建一个MilvusVectorStore对象vector_store = MilvusVectorStore( uri="./milvus_demo.db", dim=1536, overwrite=True)# 从默认值创建StorageContext对象storage_context = StorageContext.from_defaults(vector_store=vector_store)# 从文档创建VectorStoreIndex对象index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
# 在文档上创建索引from llama_index.core import StorageContext# 创建一个MilvusVectorStore对象vector_store = MilvusVectorStore( uri="./milvus_demo.db", dim=1536, overwrite=True)# 从默认值创建StorageContext对象storage_context = StorageContext.from_defaults(vector_store=vector_store)# 从文档创建VectorStoreIndex对象index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
查询数据¶
现在我们已经将文档存储在索引中,我们可以针对索引提出问题。索引将使用自身存储的数据作为chatgpt的知识库。
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
The author learned about programming on early computers like the IBM 1401 using Fortran, the limitations of early computing technology, the transition to microcomputers, and the excitement of having a personal computer like the TRS-80. Additionally, the author explored different academic paths, initially planning to study philosophy but eventually switching to AI due to a lack of interest in philosophy courses. Later on, the author pursued art education, attending RISD and the Accademia di Belli Arti in Florence, where they encountered a different approach to teaching art.
In [ ]:
Copied!
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
Dealing with the stress and challenges related to managing Hacker News was a difficult moment for the author.
这个测试显示,覆盖会删除先前的数据。
In [ ]:
Copied!
vector_store = MilvusVectorStore(
uri="./milvus_demo.db", dim=1536, overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
[Document(text="The number that is being searched for is ten.")],
storage_context,
)
query_engine = index.as_query_engine()
res = query_engine.query("Who is the author?")
print("Res:", res)
vector_store = MilvusVectorStore(
uri="./milvus_demo.db", dim=1536, overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
[Document(text="The number that is being searched for is ten.")],
storage_context,
)
query_engine = index.as_query_engine()
res = query_engine.query("Who is the author?")
print("Res:", res)
Res: The author is the individual who created the content or work in question.
下一个测试展示了向已经存在的索引中添加额外数据。
In [ ]:
Copied!
del index, vector_store, storage_context, query_engine
vector_store = MilvusVectorStore(uri="./milvus_demo.db", overwrite=False)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
res = query_engine.query("What is the number?")
print("Res:", res)
del index, vector_store, storage_context, query_engine
vector_store = MilvusVectorStore(uri="./milvus_demo.db", overwrite=False)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
res = query_engine.query("What is the number?")
print("Res:", res)
Res: The number is ten.
In [ ]:
Copied!
res = query_engine.query("Who is the author?")
print("Res:", res)
res = query_engine.query("Who is the author?")
print("Res:", res)
Res: Paul Graham