Azure CosmosDB MongoDB 向量存储¶
在本笔记本中,我们将展示如何使用 Azure CosmosDB MongoDB vCore 在 LlamaIndex 中执行向量搜索。我们将使用 Azure Open AI 创建嵌入。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-azurecosmosmongo
%pip install llama-index-llms-azure-openai
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-azurecosmosmongo
%pip install llama-index-llms-azure-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import os
import json
import openai
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os
import json
import openai
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
设置Azure OpenAI¶
第一步是配置模型。它们将用于为加载到数据库中的文档创建嵌入,并用于llm完成。
In [ ]:
Copied!
import os# 设置AzureOpenAI实例llm = AzureOpenAI( model_name=os.getenv("OPENAI_MODEL_COMPLETION"), deployment_name=os.getenv("OPENAI_MODEL_COMPLETION"), api_base=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY"), api_type=os.getenv("OPENAI_API_TYPE"), api_version=os.getenv("OPENAI_API_VERSION"), temperature=0,)# 设置OpenAIEmbedding实例embed_model = OpenAIEmbedding( model=os.getenv("OPENAI_MODEL_EMBEDDING"), deployment_name=os.getenv("OPENAI_DEPLOYMENT_EMBEDDING"), api_base=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY"), api_type=os.getenv("OPENAI_API_TYPE"), api_version=os.getenv("OPENAI_API_VERSION"),)
import os# 设置AzureOpenAI实例llm = AzureOpenAI( model_name=os.getenv("OPENAI_MODEL_COMPLETION"), deployment_name=os.getenv("OPENAI_MODEL_COMPLETION"), api_base=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY"), api_type=os.getenv("OPENAI_API_TYPE"), api_version=os.getenv("OPENAI_API_VERSION"), temperature=0,)# 设置OpenAIEmbedding实例embed_model = OpenAIEmbedding( model=os.getenv("OPENAI_MODEL_EMBEDDING"), deployment_name=os.getenv("OPENAI_DEPLOYMENT_EMBEDDING"), api_base=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY"), api_type=os.getenv("OPENAI_API_TYPE"), api_version=os.getenv("OPENAI_API_VERSION"),)
In [ ]:
Copied!
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档¶
使用SimpleDirectoryReader加载存储在data/paul_graham/
中的文档。
In [ ]:
Copied!
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id)
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id)
Document ID: c432ff1c-61ea-4c91-bd89-62be29078e79
创建索引¶
在这里,我们建立与 Azure Cosmosdb mongodb vCore 集群的连接,并创建一个向量搜索索引。
In [ ]:
Copied!
import pymongo
from llama_index.vector_stores.azurecosmosmongo import (
AzureCosmosDBMongoDBVectorSearch,
)
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext
from llama_index.core import SimpleDirectoryReader
connection_string = os.environ.get("AZURE_COSMOSDB_MONGODB_URI")
mongodb_client = pymongo.MongoClient(connection_string)
store = AzureCosmosDBMongoDBVectorSearch(
mongodb_client=mongodb_client,
db_name="demo_vectordb",
collection_name="paul_graham_essay",
)
storage_context = StorageContext.from_defaults(vector_store=store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
import pymongo
from llama_index.vector_stores.azurecosmosmongo import (
AzureCosmosDBMongoDBVectorSearch,
)
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext
from llama_index.core import SimpleDirectoryReader
connection_string = os.environ.get("AZURE_COSMOSDB_MONGODB_URI")
mongodb_client = pymongo.MongoClient(connection_string)
store = AzureCosmosDBMongoDBVectorSearch(
mongodb_client=mongodb_client,
db_name="demo_vectordb",
collection_name="paul_graham_essay",
)
storage_context = StorageContext.from_defaults(vector_store=store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
查询索引¶
现在我们可以使用我们的索引来提出问题。
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("What did the author love working on?")
query_engine = index.as_query_engine()
response = query_engine.query("What did the author love working on?")
In [ ]:
Copied!
import textwrap
print(textwrap.fill(str(response), 100))
import textwrap
print(textwrap.fill(str(response), 100))
The author loved working on multiple projects that were not their thesis while in grad school, including Lisp hacking and writing On Lisp. They eventually wrote a dissertation on applications of continuations in just 5 weeks to graduate. Afterward, they applied to art schools and were accepted into the BFA program at RISD.
In [ ]:
Copied!
response = query_engine.query("What did he/she do in summer of 2016?")
response = query_engine.query("What did he/she do in summer of 2016?")
In [ ]:
Copied!
print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))
The person moved to England with their family in the summer of 2016.