子问题查询引擎指南¶
在这个笔记本中,我们展示如何使用guidance来提高我们的子问题查询引擎的稳健性。
子问题查询引擎旨在接受实现BaseQuestionGenerator
接口的可互换问题生成器。为了利用guidance的功能,我们实现了一个新的GuidanceQuestionGenerator
(由我们的GuidancePydanticProgram
提供支持)。
指导问题生成器¶
与默认的LLMQuestionGenerator
不同,引导确保我们可以获得所需的结构化输出,并消除输出解析错误。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-question-gen-guidance
%pip install llama-index-question-gen-guidance
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
from llama_index.question_gen.guidance import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
from llama_index.question_gen.guidance import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
In [ ]:
Copied!
question_gen = GuidanceQuestionGenerator.from_defaults(
guidance_llm=GuidanceOpenAI("text-davinci-003"), verbose=False
)
question_gen = GuidanceQuestionGenerator.from_defaults(
guidance_llm=GuidanceOpenAI("text-davinci-003"), verbose=False
)
把它测试一下吧!
In [ ]:
Copied!
from llama_index.core.tools import ToolMetadata
from llama_index.core import QueryBundle
from llama_index.core.tools import ToolMetadata
from llama_index.core import QueryBundle
In [ ]:
Copied!
tools = [
ToolMetadata(
name="lyft_10k",
description="Provides information about Lyft financials for year 2021",
),
ToolMetadata(
name="uber_10k",
description="Provides information about Uber financials for year 2021",
),
]
tools = [
ToolMetadata(
name="lyft_10k",
description="Provides information about Lyft financials for year 2021",
),
ToolMetadata(
name="uber_10k",
description="Provides information about Uber financials for year 2021",
),
]
In [ ]:
Copied!
sub_questions = question_gen.generate(
tools=tools,
query=QueryBundle("Compare and contrast Uber and Lyft financial in 2021"),
)
sub_questions = question_gen.generate(
tools=tools,
query=QueryBundle("Compare and contrast Uber and Lyft financial in 2021"),
)
In [ ]:
Copied!
sub_questions
sub_questions
Out[ ]:
[SubQuestion(sub_question='What is the revenue of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the EBITDA of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the net income of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the revenue of Lyft', tool_name='lyft_10k'), SubQuestion(sub_question='What is the EBITDA of Lyft', tool_name='lyft_10k'), SubQuestion(sub_question='What is the net income of Lyft', tool_name='lyft_10k')]
使用指南问题生成器与子问题查询引擎¶
本文档介绍了如何使用指南问题生成器与子问题查询引擎来生成和查询问题。
准备数据和基本查询引擎¶
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
下载数据
In [ ]:
Copied!
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
In [ ]:
Copied!
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
In [ ]:
Copied!
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
In [ ]:
Copied!
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
构建子问题查询引擎并运行一些查询!¶
In [ ]:
Copied!
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"提供2021年Lyft财务信息"
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"提供2021年Uber财务信息"
),
),
),
]
s_engine = SubQuestionQueryEngine.from_defaults(
question_gen=question_gen, # 使用上面定义的基于指导的question_gen
query_engine_tools=query_engine_tools,
)
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"提供2021年Lyft财务信息"
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"提供2021年Uber财务信息"
),
),
),
]
s_engine = SubQuestionQueryEngine.from_defaults(
question_gen=question_gen, # 使用上面定义的基于指导的question_gen
query_engine_tools=query_engine_tools,
)
In [ ]:
Copied!
response = s_engine.query(
"Compare and contrast the customer segments and geographies that grew the"
" fastest"
)
response = s_engine.query(
"Compare and contrast the customer segments and geographies that grew the"
" fastest"
)
Generated 4 sub questions. [uber_10k] Q: What customer segments grew the fastest for Uber [uber_10k] A: in 2021? The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Couriers, Riders, and Eaters. These segments experienced growth due to the continued stay-at-home order demand related to COVID-19, as well as Uber's membership programs, such as Uber One, Uber Pass, Eats Pass, and Rides Pass. Additionally, Uber's marketplace-centric advertising helped to connect merchants and brands with its platform network, further driving growth. [uber_10k] Q: What geographies grew the fastest for Uber [uber_10k] A: Based on the context information, it appears that Uber experienced the most growth in large metropolitan areas, such as Chicago, Miami, New York City, Sao Paulo, and London. Additionally, Uber experienced growth in suburban and rural areas, as well as in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. [lyft_10k] Q: What customer segments grew the fastest for Lyft [lyft_10k] A: The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and public transit. Ridesharing grew as Lyft was able to predict demand and proactively incentivize drivers to be available for rides in the right place at the right time. Light vehicles grew as users were looking for options that were more active, usually lower-priced, and often more efficient for short trips during heavy traffic. Public transit grew as Lyft integrated third-party public transit data into the Lyft App to offer users a robust view of transportation options around them. [lyft_10k] Q: What geographies grew the fastest for Lyft [lyft_10k] A: It is not possible to answer this question with the given context information.
In [ ]:
Copied!
print(response)
print(response)
The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Couriers, Riders, and Eaters. These segments experienced growth due to the continued stay-at-home order demand related to COVID-19, as well as Uber's membership programs, such as Uber One, Uber Pass, Eats Pass, and Rides Pass. Additionally, Uber's marketplace-centric advertising helped to connect merchants and brands with its platform network, further driving growth. Uber experienced the most growth in large metropolitan areas, such as Chicago, Miami, New York City, Sao Paulo, and London. Additionally, Uber experienced growth in suburban and rural areas, as well as in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and public transit. Ridesharing grew as Lyft was able to predict demand and proactively incentivize drivers to be available for rides in the right place at the right time. Light vehicles grew as users were looking for options that were more active, usually lower-priced, and often more efficient for short trips during heavy traffic. Public transit grew as Lyft integrated third-party public transit data into the Lyft App to offer users a robust view of transportation options around them. It is not possible to answer the question of which geographies grew the fastest for Lyft with the given context information. In summary, Uber and Lyft both experienced growth in customer segments related to their respective services, such as Mobility Drivers, Couriers, Riders, and Eaters for Uber, and ridesharing, light vehicles, and public transit for Lyft. Uber experienced the most growth in large metropolitan areas, as well as in suburban and rural areas, and in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. It is not possible to answer the question of which geographies grew the fastest for Lyft with the given context information.