加载数据¶
加载保罗·格雷厄姆的文章作为示例。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' -O pg_essay.txt
--2024-01-10 12:31:00-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘pg_essay.txt’ pg_essay.txt 100%[===================>] 73.28K --.-KB/s in 0.01s 2024-01-10 12:31:00 (6.32 MB/s) - ‘pg_essay.txt’ saved [75042/75042]
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(input_files=["pg_essay.txt"])
documents = reader.load_data()
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(input_files=["pg_essay.txt"])
documents = reader.load_data()
在Elasticsearch中,可以设置查询管道(Query Pipeline)来定义一系列的查询操作,以便在执行搜索请求时按顺序应用这些操作。此外,还可以使用路由(Routing)来将索引中的文档路由到特定的分片中。在本教程中,我们将学习如何设置查询管道并使用路由将文档发送到特定的分片中。
定义模块¶
我们定义了llm、向量索引、摘要索引和提示模板。
In [ ]:
Copied!
from llama_index.core.query_pipeline import QueryPipeline, InputComponentfrom typing import Dict, Any, List, Optionalfrom llama_index.llms.openai import OpenAIfrom llama_index.core import Document, VectorStoreIndexfrom llama_index.core import SummaryIndexfrom llama_index.core.response_synthesizers import TreeSummarizefrom llama_index.core.schema import NodeWithScore, TextNodefrom llama_index.core import PromptTemplatefrom llama_index.core.selectors import LLMSingleSelector# 定义HyDE模板hyde_str = """\请写一段回答问题的文章:{query_str}尽量包含尽可能多的关键细节。文章:"""hyde_prompt = PromptTemplate(hyde_str)# 定义llmllm = OpenAI(model="gpt-3.5-turbo")# 定义综合器summarizer = TreeSummarize(llm=llm)# 定义向量检索器vector_index = VectorStoreIndex.from_documents(documents)vector_query_engine = vector_index.as_query_engine(similarity_top_k=2)# 定义摘要查询提示+检索器summary_index = SummaryIndex.from_documents(documents)summary_qrewrite_str = """\这是一个问题:{query_str}您负责将问题提供给一个代理,该代理在给定上下文的情况下尝试回答问题。上下文可能与问题相关,也可能不相关。重写问题以突出只有一些上下文片段(或没有)可能相关的事实。"""summary_qrewrite_prompt = PromptTemplate(summary_qrewrite_str)summary_query_engine = summary_index.as_query_engine()# 定义选择器selector = LLMSingleSelector.from_defaults()
from llama_index.core.query_pipeline import QueryPipeline, InputComponentfrom typing import Dict, Any, List, Optionalfrom llama_index.llms.openai import OpenAIfrom llama_index.core import Document, VectorStoreIndexfrom llama_index.core import SummaryIndexfrom llama_index.core.response_synthesizers import TreeSummarizefrom llama_index.core.schema import NodeWithScore, TextNodefrom llama_index.core import PromptTemplatefrom llama_index.core.selectors import LLMSingleSelector# 定义HyDE模板hyde_str = """\请写一段回答问题的文章:{query_str}尽量包含尽可能多的关键细节。文章:"""hyde_prompt = PromptTemplate(hyde_str)# 定义llmllm = OpenAI(model="gpt-3.5-turbo")# 定义综合器summarizer = TreeSummarize(llm=llm)# 定义向量检索器vector_index = VectorStoreIndex.from_documents(documents)vector_query_engine = vector_index.as_query_engine(similarity_top_k=2)# 定义摘要查询提示+检索器summary_index = SummaryIndex.from_documents(documents)summary_qrewrite_str = """\这是一个问题:{query_str}您负责将问题提供给一个代理,该代理在给定上下文的情况下尝试回答问题。上下文可能与问题相关,也可能不相关。重写问题以突出只有一些上下文片段(或没有)可能相关的事实。"""summary_qrewrite_prompt = PromptTemplate(summary_qrewrite_str)summary_query_engine = summary_index.as_query_engine()# 定义选择器selector = LLMSingleSelector.from_defaults()
构建查询管道¶
为向量索引、摘要索引定义一个查询管道,并将其与路由器连接在一起。
In [ ]:
Copied!
# 定义摘要查询流程from llama_index.core.query_pipeline import RouterComponentvector_chain = QueryPipeline(chain=[vector_query_engine])summary_chain = QueryPipeline( chain=[summary_qrewrite_prompt, llm, summary_query_engine], verbose=True)choices = [ "该工具回答关于文档的具体问题(而不是文档整体摘要的问题)", "该工具回答关于文档的摘要问题(而不是具体问题)",]router_c = RouterComponent( selector=selector, choices=choices, components=[vector_chain, summary_chain], verbose=True,)# 顶层流程qp = QueryPipeline(chain=[router_c], verbose=True)
# 定义摘要查询流程from llama_index.core.query_pipeline import RouterComponentvector_chain = QueryPipeline(chain=[vector_query_engine])summary_chain = QueryPipeline( chain=[summary_qrewrite_prompt, llm, summary_query_engine], verbose=True)choices = [ "该工具回答关于文档的具体问题(而不是文档整体摘要的问题)", "该工具回答关于文档的摘要问题(而不是具体问题)",]router_c = RouterComponent( selector=selector, choices=choices, components=[vector_chain, summary_chain], verbose=True,)# 顶层流程qp = QueryPipeline(chain=[router_c], verbose=True)
在这个notebook中,我们将尝试一些查询来熟悉数据库查询的语法和功能。我们将使用SQLAlchemy来执行这些查询,这是一个流行的Python SQL工具和对象关系映射器。我们将使用SQLite数据库作为示例数据库。
In [ ]:
Copied!
# 与同步方法进行比较response = qp.run("作者在YC期间做了什么?")print(str(response))
# 与同步方法进行比较response = qp.run("作者在YC期间做了什么?")print(str(response))
> Running module c0a87442-3165-443d-9709-960e6ddafe7f with input: query: What did the author do during his time in YC? Selecting component 0: The author used a tool to answer specific questions about the document, which suggests that he was engaged in analyzing and extracting specific information from the document during his time in YC.. During his time in YC, the author worked on various tasks related to running Y Combinator. This included selecting and helping founders, dealing with disputes between cofounders, figuring out when people were lying, and fighting with people who maltreated the startups. The author also worked on writing essays and internal software for YC.
In [ ]:
Copied!
response = qp.run("What is a summary of this document?")
print(str(response))
response = qp.run("What is a summary of this document?")
print(str(response))
> Running module c0a87442-3165-443d-9709-960e6ddafe7f with input: query: What is a summary of this document? Selecting component 1: The summary questions about the document are answered by this tool.. > Running module 0e7e9d49-4c92-45a9-b3bf-0e6ab76b51f9 with input: query_str: What is a summary of this document? > Running module b0ece4e3-e6cd-4229-8663-b0cd0638683c with input: messages: Here's a question: What is a summary of this document? You are responsible for feeding the question to an agent that given context will try to answer the question. The context may or may not be relev... > Running module f247ae78-a71c-4347-ba49-d9357ee93636 with input: input: assistant: What is the summary of the document? The document discusses the development and evolution of Lisp as a programming language. It highlights how Lisp was originally created as a formal model of computation and later transformed into a programming language with the assistance of Steve Russell. The document also emphasizes the unique power and elegance of Lisp in comparison to other languages.