相对分数融合和基于分布的分数融合¶
在这个示例中,我们演示了使用QueryFusionRetriever和两种旨在改进倒数排名融合的方法:
- 相对分数融合(Weaviate)
- 基于分布的分数融合(Mazzecchi: 博客文章)
%pip install llama-index-llms-openai
%pip install llama-index-retrievers-bm25
import os
import openai
os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]
设置¶
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
下载数据¶
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
接下来,我们将在文档上设置一个向量索引。
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=256)
index = VectorStoreIndex.from_documents(
documents, transformations=[splitter], show_progress=True
)
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 7.55it/s] Generating embeddings: 100%|██████████| 504/504 [00:03<00:00, 128.32it/s]
首先,我们创建我们的检索器。每个检索器将检索出与之最相似的前10个节点。
from llama_index.retrievers.bm25 import BM25Retriever
vector_retriever = index.as_retriever(similarity_top_k=5)
bm25_retriever = BM25Retriever.from_defaults(
docstore=index.docstore, similarity_top_k=10
)
接下来,我们可以创建我们的融合检索器,它将从检索器返回的前20个节点中返回相似度最高的前10个节点。
请注意,向量检索器和BM25检索器可能返回完全相同的节点,只是顺序不同;在这种情况下,它只是作为重新排序器。
from llama_index.core.retrievers import QueryFusionRetrieverretriever = QueryFusionRetriever( [vector_retriever, bm25_retriever], retriever_weights=[0.6, 0.4], similarity_top_k=10, num_queries=1, # 将此设置为1以禁用查询生成 mode="relative_score", use_async=True, verbose=True,)
# 将嵌套的异步应用于在笔记本中运行import nest_asyncionest_asyncio.apply()
nodes_with_scores = retriever.retrieve(
"What happened at Interleafe and Viaweb?"
)
for node in nodes_with_scores:
print(f"Score: {node.score:.2f} - {node.text[:100]}...\n-----")
Score: 0.60 - You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group... ----- Score: 0.59 - The UI was horrible, but it proved you could build a whole store through the browser, without any cl... ----- Score: 0.40 - We were determined to be the Microsoft Word, not the Interleaf. Which meant being easy to use and in... ----- Score: 0.36 - In its time, the editor was one of the best general-purpose site builders. I kept the code tight and... ----- Score: 0.25 - I kept the code tight and didn't have to integrate with any other software except Robert's and Trevo... ----- Score: 0.25 - If all I'd had to do was work on this software, the next 3 years would have been the easiest of my l... ----- Score: 0.21 - To find out, we decided to try making a version of our store builder that you could control through ... ----- Score: 0.11 - But the most important thing I learned, and which I used in both Viaweb and Y Combinator, is that th... ----- Score: 0.11 - The next year, from the summer of 1998 to the summer of 1999, must have been the least productive of... ----- Score: 0.07 - The point is that it was really cheap, less than half market price. [8] Most software you can launc... -----
from llama_index.core.retrievers import QueryFusionRetrieverretriever = QueryFusionRetriever( [vector_retriever, bm25_retriever], retriever_weights=[0.6, 0.4], similarity_top_k=10, num_queries=1, # 将此设置为1以禁用查询生成 mode="dist_based_score", use_async=True, verbose=True,)nodes_with_scores = retriever.retrieve( "What happened at Interleafe and Viaweb?")for node in nodes_with_scores: print(f"Score: {node.score:.2f} - {node.text[:100]}...\n-----")
Score: 0.42 - You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group... ----- Score: 0.41 - The UI was horrible, but it proved you could build a whole store through the browser, without any cl... ----- Score: 0.32 - We were determined to be the Microsoft Word, not the Interleaf. Which meant being easy to use and in... ----- Score: 0.30 - In its time, the editor was one of the best general-purpose site builders. I kept the code tight and... ----- Score: 0.27 - To find out, we decided to try making a version of our store builder that you could control through ... ----- Score: 0.24 - I kept the code tight and didn't have to integrate with any other software except Robert's and Trevo... ----- Score: 0.24 - If all I'd had to do was work on this software, the next 3 years would have been the easiest of my l... ----- Score: 0.20 - Now we felt like we were really onto something. I had visions of a whole new generation of software ... ----- Score: 0.20 - Users wouldn't need anything more than a browser. This kind of software, known as a web app, is com... ----- Score: 0.18 - But the most important thing I learned, and which I used in both Viaweb and Y Combinator, is that th... -----
在查询引擎中使用!¶
现在,我们可以将我们的检索器插入到查询引擎中,以合成自然语言响应。
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(retriever)
response = query_engine.query("What happened at Interleafe and Viaweb?")
from llama_index.core.response.notebook_utils import display_response
display_response(response)
Final Response:
At Interleaf, there was a group called Release Engineering that was as large as the group writing the software. They had to deal with versions, ports, and other complexities. In contrast, at Viaweb, the software could be updated directly on the server, simplifying the process. Viaweb was founded with $10,000 in seed funding, and the software allowed building a whole store through the browser without the need for client software or command line inputs on the server. The company aimed to be easy to use and inexpensive, offering low monthly prices for their services.