OpenAI助手代理¶

这个示例向您展示了如何使用我们构建在OpenAI助手API之上的代理抽象。

In [ ]:

Copied!

%pip install llama-index-agent-openai
%pip install llama-index-vector-stores-supabase
%pip install llama-index-agent-openai
%pip install llama-index-vector-stores-supabase

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

简单代理程序（无外部工具）¶

这里我们展示一个使用内置代码解释器的简单示例。

让我们首先导入一些简单的基本模块。

如果您在Colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.agent.openai import OpenAIAssistantAgent

In [ ]:

Copied!





agent = OpenAIAssistantAgent.from_new(
    name="Math Tutor",
    instructions="You are a personal math tutor. Write and run code to answer math questions.",
    openai_tools=[{"type": "code_interpreter"}],
    instructions_prefix="Please address the user as Jane Doe. The user has a premium account.",
)
agent = OpenAIAssistantAgent.from_new(
    name="Math Tutor",
    instructions="You are a personal math tutor. Write and run code to answer math questions.",
    openai_tools=[{"type": "code_interpreter"}],
    instructions_prefix="Please address the user as Jane Doe. The user has a premium account.",
)

In [ ]:

Copied!

agent.thread_id
agent.thread_id

Out[ ]:

'thread_ctzN0ZY3JUWETHhYxI3DiFSo'

In [ ]:

Copied!

response = agent.chat(
    "I need to solve the equation `3x + 11 = 14`. Can you help me?"
)
response = agent.chat(
    "I need to solve the equation `3x + 11 = 14`. Can you help me?"
)

In [ ]:

Copied!

print(str(response))
print(str(response))

The solution to the equation \(3x + 11 = 14\) is \(x = 1\).

带内置检索功能的助手¶

让我们通过让助手使用内置的OpenAI检索工具来测试助手。

在这里，我们在创建助手时上传并传递文件。

另一个选项是在给定线程中通过upload_files和add_message上传/传递文件ID。

In [ ]:

Copied!

from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.agent.openai import OpenAIAssistantAgent

In [ ]:

Copied!





agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    openai_tools=[{"type": "retrieval"}],
    instructions_prefix="Please address the user as Jerry.",
    files=["data/10k/lyft_2021.pdf"],
    verbose=True,
)
agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    openai_tools=[{"type": "retrieval"}],
    instructions_prefix="Please address the user as Jerry.",
    files=["data/10k/lyft_2021.pdf"],
    verbose=True,
)

In [ ]:

Copied!

response = agent.chat("What was Lyft's revenue growth in 2021?")
response = agent.chat("What was Lyft's revenue growth in 2021?")

In [ ]:

Copied!

print(str(response))
print(str(response))

Lyft's revenue increased by $843.6 million or 36% in 2021 as compared to the previous year【7†source】.

使用查询引擎工具的助手¶

在这里，我们展示了通过将OpenAIAssistantAgent与我们的查询引擎工具集成在一起，展示了其调用功能。

import pandas as pd

# 从CSV文件中加载数据
data = pd.read_csv('data.csv')

# 显示数据的前5行
data.head()

In [ ]:

Copied!





from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [ ]:

Copied!





try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

In [ ]:

Copied!

!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

--2023-11-07 00:20:08--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’

data/10k/uber_2021. 100%[===================>]   1.79M  --.-KB/s    in 0.07s   

2023-11-07 00:20:08 (24.3 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]

--2023-11-07 00:20:08--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-stream]
Saving to: ‘data/10k/lyft_2021.pdf’

data/10k/lyft_2021. 100%[===================>]   1.37M  --.-KB/s    in 0.06s   

2023-11-07 00:20:09 (22.2 MB/s) - ‘data/10k/lyft_2021.pdf’ saved [1440303/1440303]

In [ ]:

Copied!

if not index_loaded:    # 加载数据    lyft_docs = SimpleDirectoryReader(        input_files=["./data/10k/lyft_2021.pdf"]    ).load_data()    uber_docs = SimpleDirectoryReader(        input_files=["./data/10k/uber_2021.pdf"]    ).load_data()    # 构建索引    lyft_index = VectorStoreIndex.from_documents(lyft_docs)    uber_index = VectorStoreIndex.from_documents(uber_docs)    # 持久化索引    lyft_index.storage_context.persist(persist_dir="./storage/lyft")    uber_index.storage_context.persist(persist_dir="./storage/uber")
if not index_loaded:    # 加载数据    lyft_docs = SimpleDirectoryReader(        input_files=["./data/10k/lyft_2021.pdf"]    ).load_data()    uber_docs = SimpleDirectoryReader(        input_files=["./data/10k/uber_2021.pdf"]    ).load_data()    # 构建索引    lyft_index = VectorStoreIndex.from_documents(lyft_docs)    uber_index = VectorStoreIndex.from_documents(uber_docs)    # 持久化索引    lyft_index.storage_context.persist(persist_dir="./storage/lyft")    uber_index.storage_context.persist(persist_dir="./storage/uber")

In [ ]:

Copied!

lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

In [ ]:

Copied!





query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

2. 让我们来试一下¶

In [ ]:

Copied!





agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    tools=query_engine_tools,
    instructions_prefix="Please address the user as Jerry.",
    verbose=True,
    run_retrieve_sleep_time=1.0,
)
agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    tools=query_engine_tools,
    instructions_prefix="Please address the user as Jerry.",
    verbose=True,
    run_retrieve_sleep_time=1.0,
)

In [ ]:

Copied!

response = agent.chat("What was Lyft's revenue growth in 2021?")
response = agent.chat("What was Lyft's revenue growth in 2021?")

=== Calling Function ===
Calling function: lyft_10k with args: {"input":"What was Lyft's revenue growth in 2021?"}
Got output: Lyft's revenue growth in 2021 was 36%.
========================

使用自己的向量存储/检索API的助理代理¶

LlamaIndex拥有35多个向量数据库集成。您可以使用我们的助理代理来代替内部的检索API，从而在任何向量存储上使用。

这是我们完整的向量存储集成列表。我们使用随机数生成器选择了一个向量存储（Supabase）。

In [ ]:

Copied!





from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores.supabase import SupabaseVectorStore

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAssistantAgent
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores.supabase import SupabaseVectorStore

from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [ ]:

Copied!

!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

In [ ]:

Copied!

# 加载数据reader = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"])docs = reader.load_data()for doc in docs:    doc.id_ = "lyft_docs"
# 加载数据reader = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"])docs = reader.load_data()for doc in docs:    doc.id_ = "lyft_docs"

In [ ]:

Copied!





vector_store = SupabaseVectorStore(
    postgres_connection_string=(
        "postgresql://<user>:<password>@<host>:<port>/<db_name>"
    ),
    collection_name="base_demo",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)
vector_store = SupabaseVectorStore(
    postgres_connection_string=(
        "postgresql://:@:/"
    ),
    collection_name="base_demo",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

In [ ]:

Copied!

# 检查文档是否在向量存储中num_docs = vector_store.get_by_id("lyft_docs", limit=1000)print(len(num_docs))
# 检查文档是否在向量存储中num_docs = vector_store.get_by_id("lyft_docs", limit=1000)print(len(num_docs))

/Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages/vecs/collection.py:445: UserWarning: Query does not have a covering index for cosine_distance. See Collection.create_index
  warnings.warn(

In [ ]:

Copied!





lyft_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="lyft_10k",
        description=(
            "Provides information about Lyft financials for year 2021. "
            "Use a detailed plain text question as input to the tool."
        ),
    ),
)
lyft_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="lyft_10k",
        description=(
            "Provides information about Lyft financials for year 2021. "
            "Use a detailed plain text question as input to the tool."
        ),
    ),
)

In [ ]:

Copied!





agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze SEC filings.",
    tools=[lyft_tool],
    verbose=True,
    run_retrieve_sleep_time=1.0,
)
agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze SEC filings.",
    tools=[lyft_tool],
    verbose=True,
    run_retrieve_sleep_time=1.0,
)

In [ ]:

Copied!

response = agent.chat(
    "Tell me about Lyft's risk factors, as well as response to COVID-19"
)
response = agent.chat(
    "Tell me about Lyft's risk factors, as well as response to COVID-19"
)

=== Calling Function ===
Calling function: lyft_10k with args: {"input": "What are Lyft's risk factors?"}

/Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages/vecs/collection.py:445: UserWarning: Query does not have a covering index for cosine_distance. See Collection.create_index
  warnings.warn(

Got output: Lyft's risk factors include general economic factors, such as the impact of the COVID-19 pandemic and responsive measures, natural disasters, economic downturns, public health crises, or political crises. Operational factors, such as their limited operating history, financial performance, competition, unpredictability of results, uncertainty regarding market growth, ability to attract and retain qualified drivers and riders, insurance coverage, autonomous vehicle technology, reputation and brand, illegal or improper activity on the platform, accuracy of background checks, changes to pricing practices, growth and development of their network, ability to manage growth, security or privacy breaches, reliance on third parties, and ability to operate various programs and services.
========================
=== Calling Function ===
Calling function: lyft_10k with args: {"input": "How did Lyft respond to COVID-19?"}

/Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages/vecs/collection.py:445: UserWarning: Query does not have a covering index for cosine_distance. See Collection.create_index
  warnings.warn(

Got output: Lyft responded to COVID-19 by adopting multiple measures, including establishing new health and safety requirements for ridesharing and updating workplace policies. They also made adjustments to their expenses and cash flow to correlate with declines in revenues, which included headcount reductions in 2020. Additionally, Lyft temporarily reduced pricing for Flexdrive rentals in cities most affected by COVID-19 and waived rental fees for drivers who tested positive for COVID-19 or were requested to quarantine by a medical professional. These measures were implemented to mitigate the negative effects of the pandemic on their business.
========================

In [ ]:

Copied!

print(str(response))
print(str(response))

Lyft's 2021 10-K filing outlines a multifaceted risk landscape for the company, encapsulating both operational and environmental challenges that could impact its business model:

- **Economic Factors**: Risks include the ramifications of the COVID-19 pandemic, susceptibility to natural disasters, the volatility of economic downturns, and geopolitical tensions.

- **Operational Dynamics**: The company is cognizant of its limited operating history, the uncertainties surrounding its financial performance, the intense competition in the ridesharing sector, the unpredictability in financial results, and the ambiguity tied to the expansion potential of the rideshare market.

- **Human Capital**: A critical concern is the ability of Lyft to attract and maintain a robust network of both drivers and riders, which is essential for the platform's vitality.

- **Insurance and Safety**: Ensuring adequate insurance coverage for stakeholders and addressing autonomous vehicle technology risks are pivotal.

- **Reputation and Brand**: Lyft is attentive to the influence that illegal or unseemly activities on its platform can have on its public image.

- **Pricing Structure**: Changing pricing models pose a risk to Lyft's revenue streams, considering how essential pricing dynamics are to maintaining competitive service offerings.

- **Systemic Integrity**: Lyft also acknowledges risks emanating from potential system failures which could disrupt service continuity.

Furthermore, Lyft is vigilant about regulatory and legal risks that could lead to litigation and is conscious of the broader implications of climate change on its operations.

In terms of its response to COVID-19, Lyft has adopted strategic measures to secure the welfare of both its workforce and customer base:

1. **Health and Safety Protocols**: Lyft has instituted health and safety mandates tailored specifically to the ridesharing experience in view of the pandemic.

2. **Workplace Adjustments**: The company revised its workplace policies to accommodate the shifts in the work environment precipitated by the pandemic.

3. **Financial Adaptations**: To synchronize with the revenue contraction experienced during the pandemic, Lyft executed monetary realignments, which necessitated workforce reductions in 2020.

These initiatives reflect Lyft's calculated approach to navigating the operational and financial hurdles enacted by the COVID-19 pandemic. By prioritizing health and safety, nimbly altering corporate practices, and recalibrating fiscal management, Lyft aimed to buttress its business against the storm of the pandemic while setting a foundation for post-pandemic recovery.