使用自定义模型加载代理聊天

在这个笔记本中，我们将演示如何定义和加载自定义模型，以及它需要遵守的协议。

注意：根据您使用的模型，您可能需要调整代理的默认提示。

要求

此笔记本需要一些额外的依赖项，可以通过pip安装：

pip install pyautogen torch transformers sentencepiece

更多信息，请参阅安装指南。

from types import SimpleNamespace

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

import autogen
from autogen import AssistantAgent, UserProxyAgent

创建和配置自定义模型

可以通过多种方式创建自定义模型类，但需要遵守ModelClient协议和在client.py中定义的响应结构。

响应协议有一些最低要求，但可以扩展以包括任何所需的附加信息。因此，可以自定义消息检索，但需要返回一个字符串列表或ModelClientResponseProtocol.Choice.Message对象的列表。

class ModelClient(Protocol):
    """
    客户端类必须实现以下方法：
    - create 必须返回一个实现了 ModelClientResponseProtocol 的响应对象
    - cost 必须返回响应的成本
    - get_usage 必须返回一个包含以下键的字典：
        - prompt_tokens
        - completion_tokens
        - total_tokens
        - cost
        - model

    此类用于创建一个可以被 OpenAIWrapper 使用的客户端。
    从 create 返回的响应必须遵守 ModelClientResponseProtocol，但可以根据需要进行扩展。
    必须实现 message_retrieval 方法以返回响应中的字符串列表或消息列表。
    """

    RESPONSE_USAGE_KEYS = ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"]

    class ModelClientResponseProtocol(Protocol):
        class Choice(Protocol):
            class Message(Protocol):
                content: Optional[str]

            message: Message

        choices: List[Choice]
        model: str

    def create(self, params) -> ModelClientResponseProtocol:
        ...

    def message_retrieval(
        self, response: ModelClientResponseProtocol
    ) -> Union[List[str], List[ModelClient.ModelClientResponseProtocol.Choice.Message]]:
        """
        检索并返回响应中的字符串列表或 Choice.Message 列表。

        注意：如果返回 Choice.Message 列表，目前需要包含 OpenAI 的 ChatCompletion Message 对象的字段，
        因为目前代码库中的其他函数或工具调用都期望这样，除非使用自定义代理。
        """
        ...

    def cost(self, response: ModelClientResponseProtocol) -> float:
        ...

    @staticmethod
    def get_usage(response: ModelClientResponseProtocol) -> Dict:
        """使用 RESPONSE_USAGE_KEYS 返回响应的使用情况摘要。"""
        ...

简单自定义客户端示例

以下是使用Hugging Face的Open-Orca的示例。

对于响应对象，使用Python的SimpleNamespace来创建一个简单的对象，该对象可以用于存储响应数据，但可以使用任何遵循ClientResponseProtocol的对象。

# 自定义客户端与自定义模型加载器


class CustomModelClient:
    def __init__(self, config, **kwargs):
        print(f"CustomModelClient 配置: {config}")
        self.device = config.get("device", "cpu")
        self.model = AutoModelForCausalLM.from_pretrained(config["model"]).to(self.device)
        self.model_name = config["model"]
        self.tokenizer = AutoTokenizer.from_pretrained(config["model"], use_fast=False)
        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

        # 用户设置的参数由用户提供和使用，因为他们提供了自定义模型，所以这里可以做任何事情
        gen_config_params = config.get("params", {})
        self.max_length = gen_config_params.get("max_length", 256)

        print(f"加载模型 {config['model']} 到 {self.device}")

    def create(self, params):
        if params.get("stream", False) and "messages" in params:
            raise NotImplementedError("本地模型不支持流式传输。")
        else:
            num_of_responses = params.get("n", 1)

            # 可以创建自己的数据响应类
            # 这里使用 SimpleNamespace 简化起见
            # 只要符合 ClientResponseProtocol 即可

            response = SimpleNamespace()

            inputs = self.tokenizer.apply_chat_template(
                params["messages"], return_tensors="pt", add_generation_prompt=True
            ).to(self.device)
            inputs_length = inputs.shape[-1]

            # 将 inputs_length 添加到 max_length
            max_length = self.max_length + inputs_length
            generation_config = GenerationConfig(
                max_length=max_length,
                eos_token_id=self.tokenizer.eos_token_id,
                pad_token_id=self.tokenizer.eos_token_id,
            )

            response.choices = []
            response.model = self.model_name

            for _ in range(num_of_responses):
                outputs = self.model.generate(inputs, generation_config=generation_config)
                # 仅解码新生成的文本，不包括提示
                text = self.tokenizer.decode(outputs[0, inputs_length:])
                choice = SimpleNamespace()
                choice.message = SimpleNamespace()
                choice.message.content = text
                choice.message.function_call = None
                response.choices.append(choice)

            return response

    def message_retrieval(self, response):
        """从响应中检索消息。"""
        choices = response.choices
        return [choice.message.content for choice in choices]

    def cost(self, response) -> float:
        """计算响应的成本。"""
        response.cost = 0
        return 0

    @staticmethod
    def get_usage(response):
        # 返回一个包含 prompt_tokens、completion_tokens、total_tokens、cost、model 的字典
        # 如果需要跟踪使用情况，否则返回 None
        return {}

设置 API 端点

config_list_from_json 函数从环境变量或 json 文件中加载配置列表。

首先，它会查找指定名称的环境变量（在本例中为“OAI_CONFIG_LIST”），该变量需要是一个有效的 json 字符串。如果找不到该变量，则会查找同名的 json 文件。它会根据模型进行配置过滤（您也可以根据其他键进行过滤）。

json 的格式如下所示：

[
    {
        "model": "gpt-4",
        "api_key": "<your OpenAI API key here>"
    },
    {
        "model": "gpt-4",
        "api_key": "<your Azure OpenAI API key here>",
        "base_url": "<your Azure OpenAI API base here>",
        "api_type": "azure",
        "api_version": "2024-02-01"
    },
    {
        "model": "gpt-4-32k",
        "api_key": "<your Azure OpenAI API key here>",
        "base_url": "<your Azure OpenAI API base here>",
        "api_type": "azure",
        "api_version": "2024-02-01"
    }
]

您可以以任何您喜欢的方式设置 config_list 的值。请参考此 notebook 以获取不同方法的完整代码示例。

设置自定义模型的配置

您可以在同一配置列表中添加任何自定义模型加载所需的参数。

重要的是要添加 model_client_cls 字段，并将其设置为与类名对应的字符串："CustomModelClient"。

{
    "model": "Open-Orca/Mistral-7B-OpenOrca",
    "model_client_cls": "CustomModelClient",
    "device": "cuda",
    "n": 1,
    "params": {
        "max_length": 1000,
    }
},

config_list_custom = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={"model_client_cls": ["CustomModelClient"]},
)

构建代理

构建一个简单的用户代理和助手代理之间的对话

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list_custom})
user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,  # 如果可用，请将 use_docker 设置为 True 以运行生成的代码。使用 docker 比直接运行生成的代码更安全。
    },
)

将自定义客户端类注册到助手代理

assistant.register_model_client(model_client_cls=CustomModelClient)

user_proxy.initiate_chat(assistant, message="编写 Python 代码打印 Hello World!")

使用预加载模型注册自定义客户端类

如果您想更好地控制模型加载的时间，可以自己加载模型并将其作为参数传递给 CustomClient 在注册过程中。

# 自定义客户端与自定义模型加载器


class 带参数的自定义模型客户端(自定义模型客户端):
    def __init__(self, 配置, 加载的模型, 分词器, **kwargs):
        print(f"带参数的自定义模型客户端 配置: {配置}")

        self.模型名称 = 配置["model"]
        self.模型 = 加载的模型
        self.分词器 = 分词器

        self.设备 = 配置.get("device", "cpu")

        生成配置参数 = 配置.get("params", {})
        self.最大长度 = 生成配置参数.get("max_length", 256)
        print(f"加载模型 {配置['model']} 到 {self.设备}")

# 在这里加载模型


config = config_list_custom[0]
device = config.get("device", "cpu")
loaded_model = AutoModelForCausalLM.from_pretrained(config["model"]).to(device)
tokenizer = AutoTokenizer.from_pretrained(config["model"], use_fast=False)
tokenizer.pad_token_id = tokenizer.eos_token_id

添加新自定义模型的配置

{
    "model": "Open-Orca/Mistral-7B-OpenOrca",
    "model_client_cls": "CustomModelClientWithArguments",
    "device": "cuda",
    "n": 1,
    "params": {
        "max_length": 1000,
    }
},

config_list_custom = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={"model_client_cls": ["CustomModelClientWithArguments"]},
)

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list_custom})

assistant.register_model_client(
    model_client_cls=CustomModelClientWithArguments,
    loaded_model=loaded_model,
    tokenizer=tokenizer,
)

user_proxy.initiate_chat(assistant, message="编写 Python 代码打印 Hello World!")

使用自定义模型加载代理聊天

要求​

创建和配置自定义模型​

简单自定义客户端示例​

设置 API 端点​

设置自定义模型的配置​

构建代理​

将自定义客户端类注册到助手代理​

使用预加载模型注册自定义客户端类​

添加新自定义模型的配置​

要求