我的规模向量存储¶
在这个笔记本中,我们将展示如何快速使用我的规模向量存储的演示。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-vector-stores-myscale
%pip install llama-index-vector-stores-myscale
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
创建一个MyScale客户端¶
In [ ]:
Copied!
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
In [ ]:
Copied!
from os import environimport clickhouse_connectenviron["OPENAI_API_KEY"] = "sk-*"# 初始化客户端client = clickhouse_connect.get_client( host="YOUR_CLUSTER_HOST", port=8443, username="YOUR_USERNAME", password="YOUR_CLUSTER_PASSWORD",)
from os import environimport clickhouse_connectenviron["OPENAI_API_KEY"] = "sk-*"# 初始化客户端client = clickhouse_connect.get_client( host="YOUR_CLUSTER_HOST", port=8443, username="YOUR_USERNAME", password="YOUR_CLUSTER_PASSWORD",)
加载文档,构建并存储VectorStoreIndex与MyScaleVectorStore¶
在这里,我们将使用一组Paul Graham的文章作为文本来生成嵌入向量,存储在MyScaleVectorStore
中,并进行查询以找到LLM QnA循环的上下文。
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.myscale import MyScaleVectorStore
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.myscale import MyScaleVectorStore
from IPython.display import Markdown, display
In [ ]:
Copied!
# 加载文档documents = SimpleDirectoryReader("../data/paul_graham").load_data()print("文档ID:", documents[0].doc_id)print("文档数量: ", len(documents))
# 加载文档documents = SimpleDirectoryReader("../data/paul_graham").load_data()print("文档ID:", documents[0].doc_id)print("文档数量: ", len(documents))
Document ID: a5f2737c-ed18-4e5d-ab9a-75955edb816d Number of Documents: 1
下载数据
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
您可以使用SimpleDirectoryReader来逐个处理您的文件:
In [ ]:
Copied!
loader = SimpleDirectoryReader("./data/paul_graham/")documents = loader.load_data()for file in loader.input_files: print(file) # 这里是你可以进行任何预处理的地方
loader = SimpleDirectoryReader("./data/paul_graham/")documents = loader.load_data()for file in loader.input_files: print(file) # 这里是你可以进行任何预处理的地方
../data/paul_graham/paul_graham_essay.txt
In [ ]:
Copied!
# 使用元数据过滤器和存储索引进行初始化from llama_index.core import StorageContextfor document in documents: document.metadata = {"user_id": "123", "favorite_color": "blue"}vector_store = MyScaleVectorStore(myscale_client=client)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
# 使用元数据过滤器和存储索引进行初始化from llama_index.core import StorageContextfor document in documents: document.metadata = {"user_id": "123", "favorite_color": "blue"}vector_store = MyScaleVectorStore(myscale_client=client)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
In [ ]:
Copied!
import textwrapfrom llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters# 将Logging设置为DEBUG以获得更详细的输出query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter(key="user_id", value="123"), ] ), similarity_top_k=2, vector_store_query_mode="hybrid",)response = query_engine.query("What did the author learn?")print(textwrap.fill(str(response), 100))
import textwrapfrom llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters# 将Logging设置为DEBUG以获得更详细的输出query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter(key="user_id", value="123"), ] ), similarity_top_k=2, vector_store_query_mode="hybrid",)response = query_engine.query("What did the author learn?")print(textwrap.fill(str(response), 100))
清除所有索引¶
In [ ]:
Copied!
for document in documents:
index.delete_ref_doc(document.doc_id)
for document in documents:
index.delete_ref_doc(document.doc_id)