Skip to main content
Open In ColabOpen on GitHub

ScaNN

ScaNN(可扩展最近邻)是一种用于大规模高效向量相似性搜索的方法。

ScaNN 包括用于最大内积搜索的搜索空间剪枝和量化,并且还支持其他距离函数,如欧几里得距离。该实现针对支持 AVX2 的 x86 处理器进行了优化。更多详情请参阅其 Google Research github

你需要安装 langchain-community 使用 pip install -qU langchain-community 来使用这个集成

安装

通过pip安装ScaNN。或者,您可以按照ScaNN网站上的说明从源代码安装。

%pip install --upgrade --quiet  scann

检索演示

下面我们展示如何将ScaNN与Huggingface Embeddings结合使用。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import ScaNN
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)


model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

db = ScaNN.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

docs[0]

RetrievalQA 演示

接下来,我们演示如何将ScaNN与Google PaLM API结合使用。

您可以从https://developers.generativeai.google/tutorials/setup获取API密钥

from langchain.chains import RetrievalQA
from langchain_community.chat_models.google_palm import ChatGooglePalm

palm_client = ChatGooglePalm(google_api_key="YOUR_GOOGLE_PALM_API_KEY")

qa = RetrievalQA.from_chain_type(
llm=palm_client,
chain_type="stuff",
retriever=db.as_retriever(search_kwargs={"k": 10}),
)
API Reference:RetrievalQA | ChatGooglePalm
print(qa.run("What did the president say about Ketanji Brown Jackson?"))
The president said that Ketanji Brown Jackson is one of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.
print(qa.run("What did the president say about Michael Phelps?"))
The president did not mention Michael Phelps in his speech.

保存和加载本地检索索引

db.save_local("/tmp/db", "state_of_union")
restored_db = ScaNN.load_local("/tmp/db", embeddings, index_name="state_of_union")

这个页面有帮助吗?