OpenAI代理查询规划¶
在这个演示中,我们探讨向OpenAIAgent
添加QueryPlanTool
。这实际上使代理能够通过单个工具进行高级查询规划!
QueryPlanTool
旨在与OpenAI函数API很好地配合使用。该工具接受一组其他工具作为输入。工具函数签名包含一个QueryPlan Pydantic对象,该对象可以包含定义计算图的QueryNode对象的DAG。当调用工具时,代理负责通过函数签名定义此图。工具本身在任何相应的工具上执行DAG。
在这个设置中,我们使用一个熟悉的例子:2022年3月、6月和9月的Uber 10Q申报。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
# 取消注释以打开日志记录# 导入日志模块# import logging# import sys# 将日志输出到标准输出,日志级别为INFO# logging.basicConfig(stream=sys.stdout, level=logging.INFO)# 添加日志处理程序,将日志输出到标准输出# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
# 取消注释以打开日志记录# 导入日志模块# import logging# import sys# 将日志输出到标准输出,日志级别为INFO# logging.basicConfig(stream=sys.stdout, level=logging.INFO)# 添加日志处理程序,将日志输出到标准输出# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
In [ ]:
Copied!
%load_ext autoreload
%autoreload 2
%load_ext autoreload
%autoreload 2
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.llms.openai import OpenAI
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.llms.openai import OpenAI
In [ ]:
Copied!
llm = OpenAI(temperature=0, model="gpt-4")
llm = OpenAI(temperature=0, model="gpt-4")
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'
--2024-01-26 20:09:25-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1260185 (1.2M) [application/octet-stream] Saving to: ‘data/10q/uber_10q_march_2022.pdf’ data/10q/uber_10q_m 100%[===================>] 1.20M --.-KB/s in 0.09s 2024-01-26 20:09:25 (13.4 MB/s) - ‘data/10q/uber_10q_march_2022.pdf’ saved [1260185/1260185] --2024-01-26 20:09:26-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1238483 (1.2M) [application/octet-stream] Saving to: ‘data/10q/uber_10q_june_2022.pdf’ data/10q/uber_10q_j 100%[===================>] 1.18M --.-KB/s in 0.09s 2024-01-26 20:09:26 (12.7 MB/s) - ‘data/10q/uber_10q_june_2022.pdf’ saved [1238483/1238483] --2024-01-26 20:09:26-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1178622 (1.1M) [application/octet-stream] Saving to: ‘data/10q/uber_10q_sept_2022.pdf’ data/10q/uber_10q_s 100%[===================>] 1.12M --.-KB/s in 0.06s 2024-01-26 20:09:27 (19.9 MB/s) - ‘data/10q/uber_10q_sept_2022.pdf’ saved [1178622/1178622]
加载数据¶
In [ ]:
Copied!
march_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()
march_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_2022 = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()
构建索引¶
我们在每个文档(3月、6月、9月)上构建一个向量索引/查询引擎。
In [ ]:
Copied!
march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)
march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)
In [ ]:
Copied!
march_engine = march_index.as_query_engine(similarity_top_k=3, llm=llm)
june_engine = june_index.as_query_engine(similarity_top_k=3, llm=llm)
sept_engine = sept_index.as_query_engine(similarity_top_k=3, llm=llm)
march_engine = march_index.as_query_engine(similarity_top_k=3, llm=llm)
june_engine = june_index.as_query_engine(similarity_top_k=3, llm=llm)
sept_engine = sept_index.as_query_engine(similarity_top_k=3, llm=llm)
使用OpenAI功能代理和查询计划工具¶
使用OpenAIAgent,它是建立在OpenAI工具用户界面之上的。
将其提供给我们的QueryPlanTool,这是一个接受其他工具的工具。并且代理可以生成这些工具上的查询计划DAG。
In [ ]:
Copied!
from llama_index.core.tools import QueryEngineTool
query_tool_sept = QueryEngineTool.from_defaults(
query_engine=sept_engine,
name="sept_2022",
description=(
f"Provides information about Uber quarterly financials ending"
f" September 2022"
),
)
query_tool_june = QueryEngineTool.from_defaults(
query_engine=june_engine,
name="june_2022",
description=(
f"Provides information about Uber quarterly financials ending June"
f" 2022"
),
)
query_tool_march = QueryEngineTool.from_defaults(
query_engine=march_engine,
name="march_2022",
description=(
f"Provides information about Uber quarterly financials ending March"
f" 2022"
),
)
from llama_index.core.tools import QueryEngineTool
query_tool_sept = QueryEngineTool.from_defaults(
query_engine=sept_engine,
name="sept_2022",
description=(
f"Provides information about Uber quarterly financials ending"
f" September 2022"
),
)
query_tool_june = QueryEngineTool.from_defaults(
query_engine=june_engine,
name="june_2022",
description=(
f"Provides information about Uber quarterly financials ending June"
f" 2022"
),
)
query_tool_march = QueryEngineTool.from_defaults(
query_engine=march_engine,
name="march_2022",
description=(
f"Provides information about Uber quarterly financials ending March"
f" 2022"
),
)
In [ ]:
Copied!
# 定义查询计划工具from llama_index.core.tools import QueryPlanToolfrom llama_index.core import get_response_synthesizerresponse_synthesizer = get_response_synthesizer()query_plan_tool = QueryPlanTool.from_defaults( query_engine_tools=[query_tool_sept, query_tool_june, query_tool_march], response_synthesizer=response_synthesizer,)
# 定义查询计划工具from llama_index.core.tools import QueryPlanToolfrom llama_index.core import get_response_synthesizerresponse_synthesizer = get_response_synthesizer()query_plan_tool = QueryPlanTool.from_defaults( query_engine_tools=[query_tool_sept, query_tool_june, query_tool_march], response_synthesizer=response_synthesizer,)
In [ ]:
Copied!
query_plan_tool.metadata.to_openai_tool() # to_openai_function()已弃用
query_plan_tool.metadata.to_openai_tool() # to_openai_function()已弃用
Out[ ]:
{'name': 'query_plan_tool', 'description': ' This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n Tool Name: sept_2022\nTool Description: Provides information about Uber quarterly financials ending September 2022 \n\nTool Name: june_2022\nTool Description: Provides information about Uber quarterly financials ending June 2022 \n\nTool Name: march_2022\nTool Description: Provides information about Uber quarterly financials ending March 2022 \n ', 'parameters': {'title': 'QueryPlan', 'description': "Query plan.\n\nContains a list of QueryNode objects (which is a recursive object).\nOut of the list of QueryNode objects, one of them must be the root node.\nThe root node is the one that isn't a dependency of any other node.", 'type': 'object', 'properties': {'nodes': {'title': 'Nodes', 'description': 'The original question we are asking.', 'type': 'array', 'items': {'$ref': '#/definitions/QueryNode'}}}, 'required': ['nodes'], 'definitions': {'QueryNode': {'title': 'QueryNode', 'description': 'Query node.\n\nA query node represents a query (query_str) that must be answered.\nIt can either be answered by a tool (tool_name), or by a list of child nodes\n(child_nodes).\nThe tool_name and child_nodes fields are mutually exclusive.', 'type': 'object', 'properties': {'id': {'title': 'Id', 'description': 'ID of the query node.', 'type': 'integer'}, 'query_str': {'title': 'Query Str', 'description': 'Question we are asking. This is the query string that will be executed. ', 'type': 'string'}, 'tool_name': {'title': 'Tool Name', 'description': 'Name of the tool to execute the `query_str`.', 'type': 'string'}, 'dependencies': {'title': 'Dependencies', 'description': 'List of sub-questions that need to be answered in order to answer the question given by `query_str`.Should be blank if there are no sub-questions to be specified, in which case `tool_name` is specified.', 'type': 'array', 'items': {'type': 'integer'}}}, 'required': ['id', 'query_str']}}}}
In [ ]:
Copied!
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
agent = OpenAIAgent.from_tools(
[query_plan_tool],
max_function_calls=10,
llm=OpenAI(temperature=0, model="gpt-4-0613"),
verbose=True,
)
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
agent = OpenAIAgent.from_tools(
[query_plan_tool],
max_function_calls=10,
llm=OpenAI(temperature=0, model="gpt-4-0613"),
verbose=True,
)
In [ ]:
Copied!
response = agent.query("What were the risk factors in sept 2022?")
response = agent.query("What were the risk factors in sept 2022?")
In [ ]:
Copied!
from llama_index.core.tools.query_plan import QueryPlan, QueryNode
query_plan = QueryPlan(
nodes=[
QueryNode(
id=1,
query_str="risk factors",
tool_name="sept_2022",
dependencies=[],
)
]
)
from llama_index.core.tools.query_plan import QueryPlan, QueryNode
query_plan = QueryPlan(
nodes=[
QueryNode(
id=1,
query_str="risk factors",
tool_name="sept_2022",
dependencies=[],
)
]
)
In [ ]:
Copied!
QueryPlan.schema()
QueryPlan.schema()
Out[ ]:
{'title': 'QueryPlan', 'description': 'Query plan.\n\nContains the root QueryNode (which is a recursive object).\nThe root node should contain the original query string to be executed.\n\nExample query plan in JSON format:\n\n```json\n{\n "root": {\n "query_str": "Compare the demographics of France and Italy.",\n "child_nodes": [\n {\n "query_str": "What are the demographics of France?",\n "tool_name": "france_demographics",\n "child_nodes": []\n },\n {\n "query_str": "What are the demographics of Italy?",\n "tool_name": "italy_demographics",\n "child_nodes": []\n }\n ]\n }\n}\n```', 'type': 'object', 'properties': {'root': {'title': 'Root', 'description': 'Root node of the query plan. Should contain the original query string to be executed.', 'allOf': [{'$ref': '#/definitions/QueryNode'}]}}, 'required': ['root'], 'definitions': {'QueryNode': {'title': 'QueryNode', 'description': 'Query node.\n\nA query node represents a query (query_str) that must be answered.\nIt can either be answered by a tool (tool_name), or by a list of child nodes\n(child_nodes).\nThe tool_name and child_nodes fields are mutually exclusive.', 'type': 'object', 'properties': {'query_str': {'title': 'Query Str', 'description': 'Question we are asking. This is the query string that will be executed. We will either provide a tool to execute the query, or a list of child nodes containing sub-questions that will be executed first, and the results of which will be used as context to execute the current query string.', 'type': 'string'}, 'tool_name': {'title': 'Tool Name', 'description': 'Name of the tool to execute the `query_str`.', 'type': 'string'}, 'child_nodes': {'title': 'Child Nodes', 'description': 'List of child nodes representing sub-questions that need to be answered in order to answer the question given by `query_str`.Should be blank if `tool_name` is specified.', 'type': 'array', 'items': {'$ref': '#/definitions/QueryNode'}}}, 'required': ['query_str', 'child_nodes']}}}
In [ ]:
Copied!
response = agent.query(
"Analyze Uber revenue growth in March, June, and September"
)
response = agent.query(
"Analyze Uber revenue growth in March, June, and September"
)
=== Calling Function === Calling function: query_plan_tool with args: { "nodes": [ { "id": 1, "query_str": "What is Uber's revenue for March 2022?", "tool_name": "march_2022", "dependencies": [] }, { "id": 2, "query_str": "What is Uber's revenue for June 2022?", "tool_name": "june_2022", "dependencies": [] }, { "id": 3, "query_str": "What is Uber's revenue for September 2022?", "tool_name": "sept_2022", "dependencies": [] }, { "id": 4, "query_str": "Analyze Uber revenue growth in March, June, and September", "tool_name": "revenue_growth_analyzer", "dependencies": [1, 2, 3] } ] } Executing node {"id": 4, "query_str": "Analyze Uber revenue growth in March, June, and September", "tool_name": "revenue_growth_analyzer", "dependencies": [1, 2, 3]} Executing 3 child nodes Executing node {"id": 1, "query_str": "What is Uber's revenue for March 2022?", "tool_name": "march_2022", "dependencies": []} Selected Tool: ToolMetadata(description='Provides information about Uber quarterly financials ending March 2022', name='march_2022', fn_schema=None) Executed query, got response. Query: What is Uber's revenue for March 2022? Response: Uber's revenue for March 2022 was $6.854 billion. Executing node {"id": 2, "query_str": "What is Uber's revenue for June 2022?", "tool_name": "june_2022", "dependencies": []} Selected Tool: ToolMetadata(description='Provides information about Uber quarterly financials ending June 2022', name='june_2022', fn_schema=None) Executed query, got response. Query: What is Uber's revenue for June 2022? Response: Uber's revenue for June 2022 cannot be determined from the provided information. However, the revenue for the three months ended June 30, 2022, was $8,073 million. Executing node {"id": 3, "query_str": "What is Uber's revenue for September 2022?", "tool_name": "sept_2022", "dependencies": []} Selected Tool: ToolMetadata(description='Provides information about Uber quarterly financials ending September 2022', name='sept_2022', fn_schema=None) Executed query, got response. Query: What is Uber's revenue for September 2022? Response: Uber's revenue for the three months ended September 30, 2022, was $8.343 billion. Got output: Based on the provided context information, we can analyze Uber's revenue growth as follows: - In March 2022, Uber's revenue was $6.854 billion. - For the three months ended June 30, 2022, Uber's revenue was $8,073 million (or $8.073 billion). However, we do not have the specific revenue for June 2022. - For the three months ended September 30, 2022, Uber's revenue was $8.343 billion. From this information, we can observe that Uber's revenue has been growing between the periods mentioned. The revenue increased from $6.854 billion in March 2022 to $8.073 billion for the three months ended June 2022, and further increased to $8.343 billion for the three months ended September 2022. However, we cannot provide a month-by-month analysis for June and September as the specific monthly revenue figures are not available. ========================
In [ ]:
Copied!
print(str(response))
print(str(response))
Based on the provided context information, we can analyze Uber's revenue growth for the three-month periods ending in March, June, and September. 1. For the three months ended March 31, 2022, Uber's revenue was $6.854 billion. 2. For the three months ended June 30, 2022, Uber's revenue was $8.073 billion. 3. For the three months ended September 30, 2022, Uber's revenue was $8.343 billion. To analyze the growth, we can compare the revenue figures for each period: - From March to June, Uber's revenue increased by $1.219 billion ($8.073 billion - $6.854 billion), which represents a growth of approximately 17.8% (($1.219 billion / $6.854 billion) * 100). - From June to September, Uber's revenue increased by $0.270 billion ($8.343 billion - $8.073 billion), which represents a growth of approximately 3.3% (($0.270 billion / $8.073 billion) * 100). In summary, Uber experienced significant revenue growth of 17.8% between the three-month periods ending in March and June, followed by a smaller growth of 3.3% between the periods ending in June and September.
In [ ]:
Copied!
response = agent.query(
"Analyze changes in risk factors in march, june, and september for Uber"
)
response = agent.query(
"Analyze changes in risk factors in march, june, and september for Uber"
)
In [ ]:
Copied!
print(str(response))
print(str(response))
In [ ]:
Copied!
# 响应 = 代理.query("分析3月、6月和9月的Uber收入增长和风险因素")
# 响应 = 代理.query("分析3月、6月和9月的Uber收入增长和风险因素")
In [ ]:
Copied!
print(str(response))
print(str(response))
Based on the provided context information, we can analyze Uber's revenue growth for the three-month periods ending in March, June, and September. 1. For the three months ended March 31, 2022, Uber's revenue was $6.854 billion. 2. For the three months ended June 30, 2022, Uber's revenue was $8.073 billion. 3. For the three months ended September 30, 2022, Uber's revenue was $8.343 billion. To analyze the growth, we can compare the revenue figures for each period: - From March to June, Uber's revenue increased by $1.219 billion ($8.073 billion - $6.854 billion), which represents a growth of approximately 17.8% (($1.219 billion / $6.854 billion) * 100). - From June to September, Uber's revenue increased by $0.270 billion ($8.343 billion - $8.073 billion), which represents a growth of approximately 3.3% (($0.270 billion / $8.073 billion) * 100). In summary, Uber experienced significant revenue growth of 17.8% between the three-month periods ending in March and June, followed by a smaller growth of 3.3% between the periods ending in June and September.
In [ ]:
Copied!
response = agent.query(
"First look at Uber's revenue growth and risk factors in March, "
+ "then revenue growth and risk factors in September, and then compare and"
" contrast the two documents?"
)
response = agent.query(
"First look at Uber's revenue growth and risk factors in March, "
+ "then revenue growth and risk factors in September, and then compare and"
" contrast the two documents?"
)
In [ ]:
Copied!
response
response