使用函数调用进行微调
这个笔记本介绍了如何微调以提高函数调用的准确性和可靠性。您可以在这里找到有关函数调用的更多信息,关于微调的信息请查看这里。
在上面的函数调用笔记本中的背景信息:
“tools”是Chat Completion API中的一个可选参数,可用于提供函数规范。其目的是使模型能够生成符合提供的规范的函数参数。请注意,API实际上不会执行任何函数调用。开发人员需要使用模型输出来执行函数调用。
函数调用是一个非常强大的工具,当它按预期运行时。然而,我们已经看到随着函数数量的增加和手头任务的复杂性增加,函数调用变得不太准确(例如:更多的幻觉调用和错误的调用)。
在为函数调用进行微调之前,最好从以下几点开始:
- 改进函数定义。使它们更清晰,彼此之间更加明显。
- 尝试使用提示工程:通常更详细的提示可以帮助模型调用正确的函数。
_如果_上述步骤未能将函数调用改进到令人满意的水平,那么可以尝试为函数调用进行微调。
概述
这个笔记本包含三个部分
- 评估基线函数调用性能: 在给定函数上评估开箱即用的
gpt-3.5-turbo
模型(假设出于延迟和成本原因,我们不能在无人机副驾驶员中使用gpt-4o
) - 生成合成数据: 使用
gpt-4o
创建一组“黄金”提示和函数调用,以用作训练数据 - 微调: 运行微调作业,并评估微调后的模型
注意:本笔记本提供了一个示例,演示了如何仅凭函数列表创建用于微调函数调用的合成训练数据。虽然实际生产测试评估更可取,但这种方法可以产生很好的结果,并可与实际训练数据结合使用。
获取基准函数调用性能
#!pip 安装 tenacity -q
```python
```python
#!pip install openai -q
```
```
```python
#!pip 安装 typing -q
```
# ```shell
!pip install python-dotenv
```
import numpy as np
import json
import os
from IPython.display import display
import pandas as pd
from openai import OpenAI
import itertools
import time
import base64
from tenacity import retry, wait_random_exponential, stop_after_attempt
from typing import Any, Dict, List, Generator
import ast
%load_ext dotenv
%dotenv
client = OpenAI(api_key=os.environ.get("OPENAI_BUILD_HOUR_KEY"))
The dotenv extension is already loaded. To reload it, use:
%reload_ext dotenv
实用工具
让我们定义用于调用Chat Completions API的实用函数,一个用于获取完成内容,另一个用于获取函数调用。
def get_chat_completion(
messages: list[dict[str, str]],
model: str = "gpt-3.5-turbo",
max_tokens=500,
temperature=0.0,
stop=None,
tools=None,
seed=42,
functions=None,
tool_choice=None,
) -> str:
params = {
"model": model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
"stop": stop,
"tools": tools,
"seed": seed,
"tool_choice": tool_choice,
}
if functions:
params["functions"] = functions
completion = client.chat.completions.create(**params)
return completion.choices[0].message, completion.usage
def eval(model: str, system_prompt: str, function_list, prompts_to_expected_tool_name):
"""
Evaluate the performance of a model in selecting the correct function based on given prompts.
Args:
model (str): The name of the model to be evaluated.
system_prompt (str): The system prompt to be used in the chat completion.
function_list (list): A list of functions that the model can call.
prompts_to_expected_tool_name (dict): A dictionary mapping prompts to their expected function names.
Returns:
None
"""
prompts_to_actual = []
latencies = []
tokens_used = []
for prompt, expected_function in prompts_to_expected_tool_name.items():
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
]
start_time = time.time()
completion, usage = get_chat_completion(
model=model,
messages=messages,
seed=42,
tools=function_list,
temperature=0.0,
tool_choice="required",
)
end_time = time.time()
latency = (end_time - start_time) * 1000 # 转换为毫秒
latencies.append(latency)
prompts_to_actual.append(
{prompt: completion.tool_calls[0].function.name})
# 计算使用的令牌数
tokens_used.append(usage.total_tokens)
total_prompts = len(prompts_to_expected_tool_name)
# 计算比赛场数
matches = sum(
1
for result in prompts_to_actual
if list(result.values())[0]
== prompts_to_expected_tool_name[list(result.keys())[0]]
)
match_percentage = (matches / total_prompts) * 100
# 计算平均延迟
avg_latency = sum(latencies) / total_prompts
# 计算平均使用的令牌数
avg_tokens_used = sum(tokens_used) / total_prompts
# 创建一个DataFrame来存储结果
results_df = pd.DataFrame(columns=["Prompt", "Expected", "Match"])
results_list = []
for result in prompts_to_actual:
prompt = list(result.keys())[0]
actual_function = list(result.values())[0]
expected_function = prompts_to_expected_tool_name[prompt]
match = actual_function == expected_function
results_list.append(
{
"Prompt": prompt,
"Actual": actual_function,
"Expected": expected_function,
"Match": "Yes" if match else "No",
}
)
results_df = pd.DataFrame(results_list)
def style_rows(row):
match = row["Match"]
background_color = "red" if match == "No" else "white"
return ["background-color: {}; color: black".format(background_color)] * len(
row
)
styled_results_df = results_df.style.apply(style_rows, axis=1)
# 将 DataFrame 显示为表格
display(styled_results_df)
print(
f"Number of matches: {matches} out of {total_prompts} ({match_percentage:.2f}%)"
)
print(f"Average latency per request: {avg_latency:.2f} ms")
print(f"Average tokens used per request: {avg_tokens_used:.2f}")
基准测试
让我们构建一个智能无人机副驾驶员。我们希望能够给副驾驶员发送指令,并让它调用该指令的函数,或者如果指令不可行的话就拒绝该请求。我们可以首先为副驾驶员定义一个系统提示。
DRONE_SYSTEM_PROMPT = """You are an intelligent AI that controls a drone. Given a command or request from the user,
call one of your functions to complete the request. If the request cannot be completed by your available functions, call the reject_request function.
If the request is ambiguous or unclear, reject the request."""
现在让我们为助手可以执行的所有操作定义函数。
function_list = [
{
"type": "function",
"function": {
"name": "takeoff_drone",
"description": "Initiate the drone's takeoff sequence.",
"parameters": {
"type": "object",
"properties": {
"altitude": {
"type": "integer",
"description": "Specifies the altitude in meters to which the drone should ascend.",
}
},
"required": ["altitude"],
},
},
},
{
"type": "function",
"function": {
"name": "land_drone",
"description": "Land the drone at its current location or a specified landing point.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"enum": ["current", "home_base", "custom"],
"description": "Specifies the landing location for the drone.",
},
"coordinates": {
"type": "object",
"description": "GPS coordinates for custom landing location. Required if location is 'custom'.",
},
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "control_drone_movement",
"description": "Direct the drone's movement in a specific direction.",
"parameters": {
"type": "object",
"properties": {
"direction": {
"type": "string",
"enum": ["forward", "backward", "left", "right", "up", "down"],
"description": "Direction in which the drone should move.",
},
"distance": {
"type": "integer",
"description": "Distance in meters the drone should travel in the specified direction.",
},
},
"required": ["direction", "distance"],
},
},
},
{
"type": "function",
"function": {
"name": "set_drone_speed",
"description": "Adjust the speed of the drone.",
"parameters": {
"type": "object",
"properties": {
"speed": {
"type": "integer",
"description": "Specifies the speed in km/h. Valid range is 0 to 100.",
"minimum": 0,
}
},
"required": ["speed"],
},
},
},
{
"type": "function",
"function": {
"name": "control_camera",
"description": "Control the drone's camera to capture images or videos.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["photo", "video", "panorama"],
"description": "Camera mode to capture content.",
},
"duration": {
"type": "integer",
"description": "Duration in seconds for video capture. Required if mode is 'video'.",
},
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "control_gimbal",
"description": "Adjust the drone's gimbal for camera stabilization and direction.",
"parameters": {
"type": "object",
"properties": {
"tilt": {
"type": "integer",
"description": "Tilt angle for the gimbal in degrees.",
},
"pan": {
"type": "integer",
"description": "Pan angle for the gimbal in degrees.",
},
},
"required": ["tilt", "pan"],
},
},
},
{
"type": "function",
"function": {
"name": "set_drone_lighting",
"description": "Control the drone's lighting for visibility and signaling.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["on", "off", "blink", "sos"],
"description": "Lighting mode for the drone.",
}
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "return_to_home",
"description": "Command the drone to return to its home or launch location.",
"parameters": {"type": "object", "properties": {}},
},
},
{
"type": "function",
"function": {
"name": "set_battery_saver_mode",
"description": "Toggle battery saver mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle battery saver mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "set_obstacle_avoidance",
"description": "Configure obstacle avoidance settings.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle obstacle avoidance.",
}
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "set_follow_me_mode",
"description": "Enable or disable 'follow me' mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle 'follow me' mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "calibrate_sensors",
"description": "Initiate calibration sequence for drone's sensors.",
"parameters": {"type": "object", "properties": {}},
},
},
{
"type": "function",
"function": {
"name": "set_autopilot",
"description": "Enable or disable autopilot mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle autopilot mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "configure_led_display",
"description": "Configure the drone's LED display pattern and colors.",
"parameters": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"enum": ["solid", "blink", "pulse", "rainbow"],
"description": "Pattern for the LED display.",
},
"color": {
"type": "string",
"enum": ["red", "blue", "green", "yellow", "white"],
"description": "Color for the LED display. Not required if pattern is 'rainbow'.",
},
},
"required": ["pattern"],
},
},
},
{
"type": "function",
"function": {
"name": "set_home_location",
"description": "Set or change the home location for the drone.",
"parameters": {
"type": "object",
"properties": {
"coordinates": {
"type": "object",
"description": "GPS coordinates for the home location.",
}
},
"required": ["coordinates"],
},
},
},
{
"type": "function",
"function": {
"name": "reject_request",
"description": "Use this function if the request is not possible.",
"parameters": {"type": "object", "properties": {}},
},
},
]
首先,让我们看看函数调用如何处理一些直接可行的提示,然后再尝试一些明显不可能的请求,这些请求会调用’reject_request’函数。
straightforward_prompts_to_expected = {
"Land the drone at the home base": "land_drone",
"Take off the drone to 50 meters": "takeoff_drone",
"Change speed to 15 kilometers per hour": "set_drone_speed",
"Turn into an elephant!": "reject_request",
"Move the drone forward by 10 meters": "control_drone_movement",
"I want the LED display to blink in red": "configure_led_display",
"Can you take a photo?": "control_camera",
"Can you detect obstacles?": "set_obstacle_avoidance",
"Can you dance for me?": "reject_request",
"Can you follow me?": "set_follow_me_mode",
}
# 使用给定的提示评估模型
eval(
model="gpt-3.5-turbo",
system_prompt=DRONE_SYSTEM_PROMPT,
function_list=function_list,
prompts_to_expected_tool_name=straightforward_prompts_to_expected,
)
Prompt | Actual | Expected | Match | |
---|---|---|---|---|
0 | Land the drone at the home base | land_drone | land_drone | Yes |
1 | Take off the drone to 50 meters | takeoff_drone | takeoff_drone | Yes |
2 | Change speed to 15 kilometers per hour | set_drone_speed | set_drone_speed | Yes |
3 | Turn into an elephant! | reject_request | reject_request | Yes |
4 | Move the drone forward by 10 meters | control_drone_movement | control_drone_movement | Yes |
5 | I want the LED display to blink in red | configure_led_display | configure_led_display | Yes |
6 | Can you take a photo? | control_camera | control_camera | Yes |
7 | Can you detect obstacles? | set_obstacle_avoidance | set_obstacle_avoidance | Yes |
8 | Can you dance for me? | reject_request | reject_request | Yes |
9 | Can you follow me? | set_follow_me_mode | set_follow_me_mode | Yes |
Number of matches: 10 out of 10 (100.00%)
Average latency per request: 826.81 ms
Average tokens used per request: 796.20
很好!该模型在处理这些请求时表现相当不错。现在让我们尝试一些更困难的请求:几乎可行且与无人机相关的请求,但实际上无人机无法完成,飞行员应该拒绝。
challenging_prompts_to_expected = {
"Play pre-recorded audio message": "reject_request",
"Initiate following on social media": "reject_request",
"Scan environment for heat signatures": "reject_request",
"Bump into obstacles": "reject_request",
"Change drone's paint job color": "reject_request",
"Coordinate with nearby drones": "reject_request",
"Change speed to negative 120 km/h": "reject_request",
"Detect a person": "reject_request",
"Please enable night vision": "reject_request",
"Report on humidity levels around you": "reject_request",
}
# 使用具有挑战性的提示来评估模型
eval(
model="gpt-3.5-turbo",
function_list=function_list,
system_prompt=DRONE_SYSTEM_PROMPT,
prompts_to_expected_tool_name=challenging_prompts_to_expected,
)