为RAG添加可控制的代理¶
但是代理的一个大痛点是缺乏可操纵性/透明度。代理可能通过思维链/规划来处理用户查询,这需要反复调用LLM。在这个过程中,很难检查正在发生的事情,或者在中途停止/纠正执行。
本笔记本向您展示如何使用我们全新的低级代理API,在RAG流程之上实现可控制的逐步执行。
我们将在维基百科文档上展示这一点。
In [ ]:
Copied!
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
设置数据¶
这里我们从维基百科加载一个简单的不同城市的数据集。
In [ ]:
Copied!
from llama_index.core import (
VectorStoreIndex,
SimpleKeywordTableIndex,
SimpleDirectoryReader,
)
from llama_index.core import SummaryIndex
from llama_index.core.schema import IndexNode
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.callbacks import CallbackManager
from llama_index.llms.openai import OpenAI
from llama_index.core import (
VectorStoreIndex,
SimpleKeywordTableIndex,
SimpleDirectoryReader,
)
from llama_index.core import SummaryIndex
from llama_index.core.schema import IndexNode
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.callbacks import CallbackManager
from llama_index.llms.openai import OpenAI
In [ ]:
Copied!
wiki_titles = [
"Toronto",
"Seattle",
"Chicago",
"Boston",
"Houston",
]
wiki_titles = [
"Toronto",
"Seattle",
"Chicago",
"Boston",
"Houston",
]
In [ ]:
Copied!
from pathlib import Pathimport requestsfor title in wiki_titles: response = requests.get( "https://en.wikipedia.org/w/api.php", params={ "action": "query", "format": "json", "titles": title, "prop": "extracts", # 'exintro': True, "explaintext": True, }, ).json() page = next(iter(response["query"]["pages"].values())) wiki_text = page["extract"] data_path = Path("data") if not data_path.exists(): Path.mkdir(data_path) with open(data_path / f"{title}.txt", "w") as fp: fp.write(wiki_text)
from pathlib import Pathimport requestsfor title in wiki_titles: response = requests.get( "https://en.wikipedia.org/w/api.php", params={ "action": "query", "format": "json", "titles": title, "prop": "extracts", # 'exintro': True, "explaintext": True, }, ).json() page = next(iter(response["query"]["pages"].values())) wiki_text = page["extract"] data_path = Path("data") if not data_path.exists(): Path.mkdir(data_path) with open(data_path / f"{title}.txt", "w") as fp: fp.write(wiki_text)
In [ ]:
Copied!
# 加载所有维基文档city_docs = {}for wiki_title in wiki_titles: city_docs[wiki_title] = SimpleDirectoryReader( input_files=[f"data/{wiki_title}.txt"] ).load_data()
# 加载所有维基文档city_docs = {}for wiki_title in wiki_titles: city_docs[wiki_title] = SimpleDirectoryReader( input_files=[f"data/{wiki_title}.txt"] ).load_data()
定义LLM + 回调管理器
In [ ]:
Copied!
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
callback_manager = CallbackManager([])
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
callback_manager = CallbackManager([])
设置代理¶
在本节中,我们将定义我们的工具并设置代理。
定义工具集¶
这里的每个工具对应于针对单个文档/维基百科页面的简单的 top-k RAG 管道。
In [ ]:
Copied!
from llama_index.agent.openai import OpenAIAgentfrom llama_index.core import load_index_from_storage, StorageContextfrom llama_index.core.node_parser import SentenceSplitterimport osnode_parser = SentenceSplitter()# 构建代理字典query_engine_tools = []for idx, wiki_title in enumerate(wiki_titles): nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title]) if not os.path.exists(f"./data/{wiki_title}"): # 构建向量索引 vector_index = VectorStoreIndex( nodes, callback_manager=callback_manager ) vector_index.storage_context.persist( persist_dir=f"./data/{wiki_title}" ) else: vector_index = load_index_from_storage( StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"), callback_manager=callback_manager, ) # 定义查询引擎 vector_query_engine = vector_index.as_query_engine(llm=llm) # 定义工具 query_engine_tools.append( QueryEngineTool( query_engine=vector_query_engine, metadata=ToolMetadata( name=f"vector_tool_{wiki_title}", description=( "用于特定方面的问题(例如历史、艺术与文化、体育、人口统计等)相关的" f" {wiki_title}。" ), ), ) )
from llama_index.agent.openai import OpenAIAgentfrom llama_index.core import load_index_from_storage, StorageContextfrom llama_index.core.node_parser import SentenceSplitterimport osnode_parser = SentenceSplitter()# 构建代理字典query_engine_tools = []for idx, wiki_title in enumerate(wiki_titles): nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title]) if not os.path.exists(f"./data/{wiki_title}"): # 构建向量索引 vector_index = VectorStoreIndex( nodes, callback_manager=callback_manager ) vector_index.storage_context.persist( persist_dir=f"./data/{wiki_title}" ) else: vector_index = load_index_from_storage( StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"), callback_manager=callback_manager, ) # 定义查询引擎 vector_query_engine = vector_index.as_query_engine(llm=llm) # 定义工具 query_engine_tools.append( QueryEngineTool( query_engine=vector_query_engine, metadata=ToolMetadata( name=f"vector_tool_{wiki_title}", description=( "用于特定方面的问题(例如历史、艺术与文化、体育、人口统计等)相关的" f" {wiki_title}。" ), ), ) )
设置OpenAI代理¶
我们通过其组件设置OpenAI代理:一个AgentRunner以及一个OpenAIAgentWorker
。
In [ ]:
Copied!
from llama_index.core.agent import AgentRunnerfrom llama_index.agent.openai import OpenAIAgentWorker,OpenAIAgentfrom llama_index.agent.openai import OpenAIAgentWorkeropenai_step_engine = OpenAIAgentWorker.from_tools( query_engine_tools, llm=llm, verbose=True)agent = AgentRunner(openai_step_engine)# # 替代方案# agent = OpenAIAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
from llama_index.core.agent import AgentRunnerfrom llama_index.agent.openai import OpenAIAgentWorker,OpenAIAgentfrom llama_index.agent.openai import OpenAIAgentWorkeropenai_step_engine = OpenAIAgentWorker.from_tools( query_engine_tools, llm=llm, verbose=True)agent = AgentRunner(openai_step_engine)# # 替代方案# agent = OpenAIAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
开箱即用¶
In [ ]:
Copied!
response = agent.chat(
"Tell me about the demographics of Houston, and compare that with the demographics of Chicago"
)
response = agent.chat(
"Tell me about the demographics of Houston, and compare that with the demographics of Chicago"
)
Added user message to memory: Tell me about the demographics of Houston, and compare that with the demographics of Chicago === Calling Function === Calling function: vector_tool_Houston with args: { "input": "demographics" } Got output: Houston has a population of 2,304,580 according to the 2020 U.S. census. In 2017, the estimated population was 2,312,717, and in 2018 it was 2,325,502. The city has a diverse demographic makeup, with a significant number of undocumented immigrants residing in the Houston area, comprising nearly 9% of the city's metropolitan population in 2017. The age distribution in Houston includes a significant number of individuals under 15 and between the ages of 20 to 34. The median age of the city is 33.4. The city has a mix of homeowners and renters, with an estimated 42.3% of Houstonians owning housing units. The median household income in 2019 was $52,338, and 20.1% of Houstonians lived at or below the poverty line. ======================== === Calling Function === Calling function: vector_tool_Chicago with args: { "input": "demographics" } Got output: Chicago experienced rapid population growth during its first hundred years, becoming one of the fastest-growing cities in the world. From its founding in 1833 with fewer than 200 people, the population grew to over 4,000 within seven years. By 1890, the population had surpassed 1 million, making Chicago the fifth-largest city in the world at the time. The city's population continued to grow, reaching its highest recorded population of 3.6 million in 1950. However, in the latter half of the 20th century, Chicago's population declined, dropping to under 2.7 million by 2010. The city experienced a rise in population for the 2000 census, followed by a decrease in 2010, and then another increase for the 2020 census. According to U.S. census estimates as of July 2019, the largest racial or ethnic groups in Chicago are non-Hispanic White (32.8%), Blacks (30.1%), and Hispanics (29.0%). Additionally, Chicago has the third-largest LGBT population in the United States, with an estimated 7.5% of the adult population identifying as LGBTQ in 2018. ========================
In [ ]:
Copied!
print(str(response))
print(str(response))
Houston has a larger population compared to Chicago, with 2,304,580 residents according to the 2020 U.S. census. In contrast, Chicago's population is estimated to be around 2.7 million as of 2019. Both cities have diverse demographics. Houston has a significant number of undocumented immigrants, comprising nearly 9% of the metropolitan population in 2017. Chicago, on the other hand, has a diverse racial and ethnic makeup, with non-Hispanic Whites, Blacks, and Hispanics being the largest groups. Non-Hispanic Whites make up 32.8% of Chicago's population, while Blacks account for 30.1% and Hispanics make up 29.0%. In terms of age distribution, Houston has a significant number of individuals under 15 and between the ages of 20 to 34. The median age in Houston is 33.4. Chicago's age distribution is not specified in the provided information. Regarding homeownership, Houston has an estimated 42.3% of residents owning housing units. The homeownership rate in Chicago is not mentioned. The median household income in Houston was $52,338 in 2019. The poverty rate in Houston was 20.1%. The median household income and poverty rate for Chicago are not provided. Overall, both Houston and Chicago have diverse populations, but Houston has a larger population and a higher percentage of undocumented immigrants. Chicago has a diverse racial and ethnic makeup, with non-Hispanic Whites, Blacks, and Hispanics being the largest groups.
In [ ]:
Copied!
# 列出任务和步骤以便查看tasks = agent.list_tasks()print(f"任务ID: {tasks[-1].task.task_id}")completed_steps = agent.get_completed_steps(tasks[-1].task.task_id)print(f"步骤数量: {len(completed_steps)}")
# 列出任务和步骤以便查看tasks = agent.list_tasks()print(f"任务ID: {tasks[-1].task.task_id}")completed_steps = agent.get_completed_steps(tasks[-1].task.task_id)print(f"步骤数量: {len(completed_steps)}")
Task ID: d7c5b296-b841-429c-ac86-08ff37129a68 Number of steps: 3
In [ ]:
Copied!
# 开始任务task = agent.create_task( "告诉我休斯顿的人口统计情况,并将其与芝加哥的人口统计情况进行比较?")
# 开始任务task = agent.create_task( "告诉我休斯顿的人口统计情况,并将其与芝加哥的人口统计情况进行比较?")
这将返回一个Task
对象,其中包含input
、extra_state
中的附加状态和其他字段。
现在让我们尝试执行此任务的单个步骤。
In [ ]:
Copied!
step_output = agent.run_step(task.task_id)
step_output = agent.run_step(task.task_id)
=== Calling Function === Calling function: vector_tool_Houston with args: { "input": "demographics" } Got output: Houston has a population of 2,304,580 according to the 2020 U.S. census. In 2017, the estimated population was 2,312,717, and in 2018 it was 2,325,502. The city has a diverse demographic makeup, with a significant number of undocumented immigrants residing in the Houston area, comprising nearly 9% of the city's metropolitan population in 2017. The age distribution in Houston includes a significant number of individuals under 15 and between the ages of 20 to 34. The median age of the city is 33.4. The city has a mix of homeowners and renters, with an estimated 42.3% of Houstonians owning housing units. The median household income in 2019 was $52,338, and 20.1% of Houstonians lived at or below the poverty line. ========================
当我们检查日志和输出时,我们发现第一部分已经执行了 - 休斯顿的人口统计信息。
In [ ]:
Copied!
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
Num completed for task 47c83928-06f5-4c54-9f37-70451d76b675: 1
我们也可以看一下即将到来的步骤。
注意:目前输入内容没有显示,因为步骤的执行纯粹取决于内部内存。这是我们正在努力解决的问题!
In [ ]:
Copied!
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]
Num upcoming steps for task 47c83928-06f5-4c54-9f37-70451d76b675: 1
Out[ ]:
TaskStep(task_id='47c83928-06f5-4c54-9f37-70451d76b675', step_id='43769c9c-61ed-47a2-84dd-a553ba8dcbba', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)
如果你想要暂停执行,你可以这样做 - 你可以在不完成代理流程的情况下获取中间结果!
注意: 直到任务完成并提交之前,代理的memory
(agent.memory
)不会被修改 - 所以如果你现在暂停,memory
不会被提交。这对于执行失败的情况是有好处的。
让我们运行接下来的两个步骤。
In [ ]:
Copied!
step_output = agent.run_step(task.task_id)
step_output = agent.run_step(task.task_id)
=== Calling Function === Calling function: vector_tool_Chicago with args: { "input": "demographics" } Got output: Chicago experienced rapid population growth during its first hundred years, becoming one of the fastest-growing cities in the world. From its founding in 1833 with fewer than 200 people, the population grew to over 4,000 within seven years. By 1890, the population had surpassed 1 million, making Chicago the fifth-largest city in the world at the time. The city's population continued to grow, reaching its highest recorded population of 3.6 million in 1950. However, in the latter half of the 20th century, Chicago's population declined, dropping to under 2.7 million by 2010. The city experienced waves of immigration, with various ethnic groups, including Irish, Italians, Jews, Poles, Greeks, and African Americans from the American South, contributing to the city's diverse population. According to the most recent U.S. census estimates, the largest racial or ethnic groups in Chicago are non-Hispanic White, Black, and Hispanic. Additionally, Chicago has a significant LGBT population and became a sanctuary city in 2012. ========================
In [ ]:
Copied!
step_output = agent.run_step(task.task_id)
print(step_output.is_last)
step_output = agent.run_step(task.task_id)
print(step_output.is_last)
True
由于步骤看起来很好,我们现在可以调用 finalize_response
,获取我们的响应。
这也将提交任务执行到我们 agent_runner
中的 memory
对象。我们可以对其进行检查。
In [ ]:
Copied!
response = agent.finalize_response(task.task_id)
response = agent.finalize_response(task.task_id)
In [ ]:
Copied!
print(str(response))
print(str(response))
Houston has a population of 2,304,580 according to the 2020 U.S. census, while Chicago had a population of under 2.7 million in 2010. Both cities have diverse populations with various ethnic groups contributing to their demographics. In terms of age distribution, Houston has a significant number of individuals under 15 and between the ages of 20 to 34, with a median age of 33.4. Chicago's population has a diverse age range as well, but specific age distribution data was not provided. In terms of homeownership, Houston has an estimated 42.3% of residents owning housing units. Data on homeownership in Chicago was not provided. The median household income in Houston is $52,338, while specific income data for Chicago was not provided. Both cities have experienced waves of immigration, contributing to their diverse populations. Chicago has a significant LGBT population and became a sanctuary city in 2012, while specific information about these aspects in Houston was not provided. Overall, both Houston and Chicago have diverse populations with various ethnic groups and age distributions. Houston has a slightly smaller population but a higher homeownership rate and median household income compared to Chicago.
In [ ]:
Copied!
tasks = agent.list_tasks()
print(len(tasks))
tasks = agent.list_tasks()
print(len(tasks))
2
In [ ]:
Copied!
task_state = tasks[-1]
steps = agent.get_completed_steps(task_state.task.task_id)
print(len(steps))
task_state = tasks[-1]
steps = agent.get_completed_steps(task_state.task.task_id)
print(len(steps))
3