使用LlamaIndex构建多租户RAG系统¶
在这个笔记本中,您将学习如何使用LlamaIndex构建多租户RAG系统。
- 设置
- 下载数据
- 加载数据
- 创建索引
- 创建摄取管道
- 更新元数据并插入文档
- 为每个用户定义查询引擎
- 查询
设置¶
您应该确保已安装llama-index
和pypdf
。
!pip install llama-index pypdf
设置OpenAI的API¶
import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
from llama_index.core import SimpleDirectoryReader
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from IPython.display import HTML
下载数据¶
我们将使用 An LLM Compiler for Parallel Function Calling
和 Dense X Retrieval: What Retrieval Granularity Should We Use?
两篇论文进行演示。
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "llm_compiler.pdf"
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "dense_x_retrieval.pdf"
--2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.04511.pdf Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ... Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 755837 (738K) [application/pdf] Saving to: ‘llm_compiler.pdf’ llm_compiler.pdf 0%[ ] 0 --.-KB/s llm_compiler.pdf 100%[===================>] 738.12K --.-KB/s in 0.004s 2024-01-15 14:29:26 (163 MB/s) - ‘llm_compiler.pdf’ saved [755837/755837] --2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.06648.pdf Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ... Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1103758 (1.1M) [application/pdf] Saving to: ‘dense_x_retrieval.pdf’ dense_x_retrieval.p 100%[===================>] 1.05M --.-KB/s in 0.005s 2024-01-15 14:29:26 (208 MB/s) - ‘dense_x_retrieval.pdf’ saved [1103758/1103758]
加载数据¶
reader = SimpleDirectoryReader(input_files=["dense_x_retrieval.pdf"])
documents_jerry = reader.load_data()
reader = SimpleDirectoryReader(input_files=["llm_compiler.pdf"])
documents_ravi = reader.load_data()
创建一个空索引¶
index = VectorStoreIndex.from_documents(documents=[])
创建数据摄取管道¶
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=512, chunk_overlap=20),
]
)
更新元数据并插入文档¶
# 对杰瑞的文档进行遍历
for document in documents_jerry:
document.metadata["user"] = "Jerry"
# 运行管道并获取节点
nodes = pipeline.run(documents=documents_jerry)
# 将节点插入索引
index.insert_nodes(nodes)
# 遍历ravi的文档
for document in documents_ravi:
document.metadata["user"] = "Ravi"
# 运行管道,获取节点
nodes = pipeline.run(documents=documents_ravi)
# 将节点插入索引
index.insert_nodes(nodes)
定义查询引擎¶
为用户定义必要的过滤器,为其定义查询引擎。
# 为Jerry
jerry_query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
ExactMatchFilter(
key="user",
value="Jerry",
)
]
),
similarity_top_k=3,
)
# 为Ravi
ravi_query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
ExactMatchFilter(
key="user",
value="Ravi",
)
]
),
similarity_top_k=3,
)
查询¶
# Jerry有一篇关于密集检索的论文,应该能够回答以下问题。
response = jerry_query_engine.query(
"论文中提到了哪些命题?"
)
# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))The paper mentions propositions as an alternative retrieval unit choice. Propositions are defined as atomic expressions of meanings in text that correspond to distinct pieces of meaning in the text. They are minimal and cannot be further split into separate propositions. Each proposition is contextualized and self-contained, including all the necessary context from the text to interpret its meaning. The paper demonstrates the concept of propositions using an example about the Leaning Tower of Pisa, where the passage is split into three propositions, each corresponding to a distinct factoid about the tower.
# Ravi有LLMCompiler论文
response = ravi_query_engine.query("LLMCompiler涉及哪些步骤?")
# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))LLMCompiler consists of three key components: an LLM Planner, a Task Fetching Unit, and an Executor. The LLM Planner identifies the execution flow by defining different function calls and their dependencies based on user inputs. The Task Fetching Unit dispatches the function calls that can be executed in parallel after substituting variables with the actual outputs of preceding tasks. Finally, the Executor executes the dispatched function calling tasks using the associated tools. These components work together to optimize the parallel function calling performance of LLMs.
# 这不应该被回答,因为Jerry没有关于LLMCompiler的信息
response = jerry_query_engine.query("LLMCompiler涉及哪些步骤?")
# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))The steps involved in LLMCompiler are not mentioned in the given context information.