mixedbread Rerank Cookbook¶
mixedbread.ai发布了三个完全开源的重新排序模型,采用Apache 2.0许可证。更详细的信息,请查看他们的详细博客文章。以下是这三个模型:
mxbai-rerank-xsmall-v1
mxbai-rerank-base-v1
mxbai-rerank-large-v1
在这个笔记本中,我们将演示如何使用LlamaIndex
中的SentenceTransformerRerank
模块与mxbai-rerank-base-v1
模型。这个设置允许您无缝地使用SentenceTransformerRerank
模块来增强您的RAG管道,并随时切换到您选择的任何重新排序模型。
安装说明¶
In [ ]:
Copied!
!pip install llama-index
!pip install sentence-transformers
!pip install llama-index
!pip install sentence-transformers
设置API密钥¶
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
In [ ]:
Copied!
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
)
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
)
from llama_index.core.postprocessor import SentenceTransformerRerank
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-03-01 09:52:09-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.007s 2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
加载文档¶
In [ ]:
Copied!
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
构建索引¶
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents=documents)
index = VectorStoreIndex.from_documents(documents=documents)
定义 mxbai-rerank-base-v1
重排序器的后处理器¶
In [ ]:
Copied!
from llama_index.core.postprocessor import SentenceTransformerRerank
postprocessor = SentenceTransformerRerank(
model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2
)
from llama_index.core.postprocessor import SentenceTransformerRerank
postprocessor = SentenceTransformerRerank(
model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2
)
创建查询引擎¶
我们首先将检索出10个相关节点,然后使用定义的后处理器选择前2个节点。
In [ ]:
Copied!
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[postprocessor],
)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[postprocessor],
)
测试查询¶
In [ ]:
Copied!
response = query_engine.query(
"Why did Sam Altman decline the offer of becoming president of Y Combinator?",
)
print(response)
response = query_engine.query(
"Why did Sam Altman decline the offer of becoming president of Y Combinator?",
)
print(response)
Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.
In [ ]:
Copied!
response = query_engine.query(
"Why did Paul Graham start YC?",
)
print(response)
response = query_engine.query(
"Why did Paul Graham start YC?",
)
print(response)
Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.