Transformers 文档

代理和工具

Transformers

代理和工具

什么是代理？

经过训练以执行因果语言建模的大型语言模型（LLMs）可以处理广泛的任务，但它们通常在逻辑、计算和搜索等基本任务上表现不佳。当在它们不擅长的领域中被提示时，它们往往无法生成我们期望的答案。

克服这一弱点的一种方法是创建一个代理。

代理是一个使用LLM作为其引擎的系统，并且它可以访问称为工具的函数。

这些工具是用于执行任务的函数，它们包含了代理正确使用它们所需的所有必要描述。

代理可以被编程为：

设计一系列操作/工具并一次性运行它们，就像CodeAgent一样
计划和执行动作/工具，一个接一个，并在启动下一个动作之前等待每个动作的结果，就像ReactJsonAgent一样

代理类型

代码代理

该代理有一个规划步骤，然后生成Python代码以一次性执行所有操作。它原生处理其工具的不同输入和输出类型，因此是多模态任务的推荐选择。

React 代理

这是解决推理任务的首选代理，因为ReAct框架（Yao et al., 2022）使其在基于先前观察的基础上进行思考非常高效。

我们实现了两个版本的ReactJsonAgent：

ReactJsonAgent 在其输出中生成工具调用的JSON。
ReactCodeAgent 是一种新型的 ReactJsonAgent，它将其工具调用生成为代码块，这对于具有强大编码性能的LLMs非常有效。

阅读Open-source LLMs as LangChain Agents博客文章，了解更多关于ReAct代理的信息。

React代理的框架

例如，这里展示了一个ReAct代码代理如何处理以下问题。

>>> agent.run(
...     "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?",
... )
=====New task=====
How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?
====Agent is executing the code below:
bert_blocks = search(query="number of blocks in BERT base encoder")
print("BERT blocks:", bert_blocks)
====
Print outputs:
BERT blocks: twelve encoder blocks

====Agent is executing the code below:
attention_layer = search(query="number of layers in Attention is All You Need")
print("Attention layers:", attention_layer)
====
Print outputs:
Attention layers: Encoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position- 2 Page 3 Figure 1: The Transformer - model architecture.

====Agent is executing the code below:
bert_blocks = 12
attention_layers = 6
diff = bert_blocks - attention_layers
print("Difference in blocks:", diff)
final_answer(diff)
====

Print outputs:
Difference in blocks: 6

Final answer: 6

如何构建一个代理？

要初始化一个代理，你需要这些参数：

一个LLM来驱动你的代理 - 代理并不完全是LLM，它更像是代理是一个使用LLM作为其引擎的程序。
系统提示：LLM引擎将根据此提示生成其输出
一个工具箱，代理从中选择工具来执行
一个解析器，用于从LLM输出中提取要调用的工具及其参数

在代理系统初始化时，工具属性用于生成工具描述，然后将其嵌入代理的system_prompt中，以使其知道可以使用哪些工具以及为什么使用。

首先，请安装agents附加组件以安装所有默认依赖项。

pip install transformers[agents]

通过定义一个llm_engine方法来构建你的LLM引擎，该方法接受一个messages列表并返回文本。这个可调用对象还需要接受一个stop参数，该参数指示何时停止生成。

from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

client = InferenceClient(model="meta-llama/Meta-Llama-3-70B-Instruct")

def llm_engine(messages, stop_sequences=["Task"]) -> str:
    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
    answer = response.choices[0].message.content
    return answer

你可以使用任何llm_engine方法，只要：

它遵循消息格式（List[Dict[str, str]]）作为其输入messages，并返回一个str。
它在传入参数stop_sequences的序列处停止生成输出

此外，llm_engine 还可以接受一个 grammar 参数。在代理初始化时指定 grammar 的情况下，此参数将传递给对 llm_engine 的调用，使用您在初始化时定义的 grammar，以实现受限生成，从而强制生成格式正确的代理输出。

你还需要一个tools参数，它接受一个Tools列表——它可以是一个空列表。你也可以通过定义可选参数add_base_tools=True在你的tools列表上添加默认的工具箱。

现在你可以创建一个代理，比如CodeAgent，并运行它。你也可以创建一个TransformersEngine，使用预初始化的管道在你的本地机器上运行推理，使用transformers。为了方便起见，由于代理行为通常需要更强的模型，比如Llama-3.1-70B-Instruct，这些模型目前很难在本地运行，我们还提供了HfApiEngine类，它在底层初始化了一个huggingface_hub.InferenceClient。

from transformers import CodeAgent, HfApiEngine

llm_engine = HfApiEngine(model="meta-llama/Meta-Llama-3-70B-Instruct")
agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)

agent.run(
    "Could you translate this sentence from French, say it out loud and return the audio.",
    sentence="Où est la boulangerie la plus proche?",
)

这在紧急情况下需要法棍时会非常方便！你甚至可以保留参数 llm_engine 未定义，默认情况下会创建一个 HfApiEngine。

from transformers import CodeAgent

agent = CodeAgent(tools=[], add_base_tools=True)

agent.run(
    "Could you translate this sentence from French, say it out loud and give me the audio.",
    sentence="Où est la boulangerie la plus proche?",
)

请注意，我们使用了一个额外的sentence参数：你可以将文本作为额外参数传递给模型。

你也可以使用这个来指示模型使用的本地或远程文件的路径：

from transformers import ReactCodeAgent

agent = ReactCodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)

agent.run("Why does Mike not know many people in New York?", audio="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3")

提示和输出解析器已自动定义，但您可以通过在您的代理上调用system_prompt_template轻松检查它们。

print(agent.system_prompt_template)

尽可能清楚地解释你想要执行的任务是非常重要的。每次run()操作都是独立的，由于代理是由LLM驱动的，提示中的微小变化可能会产生完全不同的结果。你也可以连续运行代理以执行不同的任务：每次agent.task和agent.logs属性都会被重新初始化。

代码执行

Python 解释器会在一组与工具一起传递的输入上执行代码。这应该是安全的，因为唯一可以调用的函数是你提供的工具（特别是如果只有 Hugging Face 的工具）和 print 函数，所以你已经限制了可以执行的内容。

Python 解释器默认也不允许在安全列表之外进行导入，因此所有最明显的攻击都不应该成为问题。你仍然可以通过在初始化 ReactCodeAgent 或 CodeAgent 时，将授权的模块作为字符串列表传递给参数 additional_authorized_imports 来授权额外的导入：

>>> from transformers import ReactCodeAgent

>>> agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4'])
>>> agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?")

(...)
'Hugging Face – Blog'

执行将在任何尝试执行非法操作的代码处停止，或者如果代理生成的代码中存在常规的Python错误。

LLM可以生成任意代码，然后这些代码将被执行：不要添加任何不安全的导入！

系统提示

一个代理，或者更准确地说，驱动代理的LLM，根据系统提示生成输出。系统提示可以根据预期任务进行定制和调整。例如，查看ReactCodeAgent的系统提示（以下版本略有简化）。

You will be given a task to solve as best you can.
You have access to the following tools:
<<tool_descriptions>>

To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.

At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '/End code' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step.

In the end you have to return a final answer using the `final_answer` tool.

Here are a few examples using notional tools:
---
{examples}

Above example were using notional tools that might not exist for you. You only have acces to those tools:
<<tool_names>>
You also can perform computations in the python code you generate.

Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```<end_code>' sequence. You MUST provide at least the 'Code:' sequence to move forward.

Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks.
Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result.

Remember to make sure that variables you use are all defined.

Now Begin!

系统提示包括：

一个介绍，解释代理应该如何行为以及工具是什么。
所有工具的描述由一个<>标记定义，该标记在运行时动态替换为用户定义/选择的工具。
- 工具描述来自工具属性，name、description、inputs和output_type，以及一个您可以优化的简单jinja2模板。
预期的输出格式。

你可以改进系统提示，例如，通过添加输出格式的解释。

为了获得最大的灵活性，您可以通过将自定义提示作为参数传递给system_prompt参数来覆盖整个系统提示模板。

from transformers import ReactJsonAgent
from transformers.agents import PythonInterpreterTool

agent = ReactJsonAgent(tools=[PythonInterpreterTool()], system_prompt="{your_custom_prompt}")

请确保在template中的某个地方定义<>字符串，以便代理知道可用的工具。

检查代理运行

以下是一些有用的属性，用于检查运行后发生了什么：

agent.logs 存储代理的细粒度日志。在代理运行的每一步，所有内容都会被存储在一个字典中，然后附加到 agent.logs 中。
运行 agent.write_inner_memory_from_logs() 会创建代理日志的内部记忆，供LLM查看，作为聊天消息的列表。此方法会遍历日志的每一步，并仅将其感兴趣的内容存储为消息：例如，它会将系统提示和任务分别保存为消息，然后对于每一步，它会将LLM输出存储为一条消息，工具调用输出存储为另一条消息。如果你想要更高层次的视角来了解发生了什么，可以使用此方法——但并非所有日志都会被此方法转录。

工具

工具是代理使用的原子函数。

例如，您可以检查PythonInterpreterTool：它有一个名称、一个描述、输入描述、一个输出类型，以及一个__call__方法来执行操作。

当代理初始化时，工具属性用于生成工具描述，该描述被嵌入到代理的系统提示中。这让代理知道它可以使用哪些工具以及为什么使用。

默认工具箱

Transformers 提供了一个默认的工具箱，用于增强代理的功能，你可以在初始化时通过参数 add_base_tools = True 将其添加到你的代理中：

文档问答：给定一个图像格式的文档（如PDF），回答关于该文档的问题（Donut）
图像问答：给定一张图像，回答关于该图像的问题（VILT）
语音转文字: 给定一个人说话的录音，将语音转录为文字 (Whisper)
文本转语音: 将文本转换为语音 (SpeechT5)
翻译: 将给定的句子从源语言翻译成目标语言。
DuckDuckGo 搜索*: 使用 DuckDuckGo 浏览器执行网页搜索。
Python代码解释器: 在安全环境中运行由LLM生成的Python代码。此工具仅在您使用add_base_tools=True初始化ReactJsonAgent时添加，因为基于代码的代理已经可以原生执行Python代码

你可以通过调用load_tool()函数来手动使用工具并执行任务。

from transformers import load_tool

tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")

创建一个新工具

你可以创建自己的工具，用于Hugging Face默认工具未涵盖的用例。例如，让我们创建一个工具，该工具从Hub返回给定任务中下载次数最多的模型。

你将从下面的代码开始。

from huggingface_hub import list_models

task = "text-classification"

model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
print(model.id)

这段代码可以快速转换为工具，只需将其包装在函数中并添加tool装饰器：

from transformers import tool

@tool
def model_download_tool(task: str) -> str:
    """
    This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub.
    It returns the name of the checkpoint.

    Args:
        task: The task for which
    """
    model = next(iter(list_models(filter="text-classification", sort="downloads", direction=-1)))
    return model.id

该函数需要：

一个清晰的名称。名称通常描述工具的功能。由于代码返回任务中下载次数最多的模型，我们可以将其命名为model_download_tool。
输入和输出的类型提示
描述，包括一个“Args:”部分，其中描述了每个参数（这次没有类型指示，它将从类型提示中提取）。所有这些将在初始化时自动嵌入到代理的系统提示中：因此努力使它们尽可能清晰！

此定义格式与apply_chat_template中使用的工具模式相同，唯一的区别是添加了tool装饰器：更多关于我们工具使用API的信息，请参阅这里。

然后你可以直接初始化你的代理：

from transformers import CodeAgent
agent = CodeAgent(tools=[model_download_tool], llm_engine=llm_engine)
agent.run(
    "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?"
)

您将获得以下内容：

======== New task ========
Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?
==== Agent is executing the code below:
most_downloaded_model = model_download_tool(task="text-to-video")
print(f"The most downloaded model for the 'text-to-video' task is {most_downloaded_model}.")
====

输出如下： "The most downloaded model for the 'text-to-video' task is ByteDance/AnimateDiff-Lightning."

管理你的代理工具箱

如果您已经初始化了一个代理，从头开始重新初始化并使用您想要使用的工具会很不方便。使用Transformers，您可以通过添加或替换工具来管理代理的工具箱。

让我们将model_download_tool添加到一个仅使用默认工具箱初始化的现有代理中。

from transformers import CodeAgent

agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)
agent.toolbox.add_tool(model_download_tool)

现在我们可以利用新工具和之前的文本转语音工具：

agent.run(
    "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub and return the audio?"
)

音频

在为已经运行良好的代理添加工具时要小心，因为它可能会偏向选择你的工具，或者选择已定义工具之外的其他工具。

使用 agent.toolbox.update_tool() 方法来替换代理工具箱中的现有工具。如果你的新工具是现有工具的一对一替换，这将非常有用，因为代理已经知道如何执行该特定任务。只需确保新工具遵循与被替换工具相同的API，或者调整系统提示模板，以确保所有使用被替换工具的示例都已更新。

使用一组工具

你可以通过使用ToolCollection对象来利用工具集合，使用你想要使用的集合的slug。然后将它们作为列表传递以初始化你的代理，并开始使用它们！

from transformers import ToolCollection, ReactCodeAgent

image_tool_collection = ToolCollection(collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f")
agent = ReactCodeAgent(tools=[*image_tool_collection.tools], add_base_tools=True)

agent.run("Please draw me a picture of rivers and lakes.")

为了加快启动速度，只有在代理调用时才会加载工具。

这将得到这张图片：

< > Update on GitHub

←Share your model Agents, supercharged - Multi-agents, External tools, and more→