跳到主要内容

如何通过新的seed参数使您的完成输出具有可重现性

nbviewer

TLDR:开发人员现在可以在Chat Completion请求中指定seed参数,以接收(大多数情况下)一致的输出。为了帮助您跟踪这些更改,我们公开了system_fingerprint字段。如果此值不同,您可能会看到不同的输出,这是由于我们在系统上所做的更改。请注意,此功能处于测试阶段,目前仅支持gpt-4-1106-previewgpt-3.5-turbo-1106

背景

在使用我们的API时,用户社区一直对可重现性提出了很大的需求。例如,当获得可重现的数值结果的能力时,用户可以解锁一些对数值变化敏感的用例。

用于一致输出的模型级功能

Chat Completions和Completions API默认是非确定性的(这意味着模型输出可能因请求而异),但现在通过一些模型级控制提供了一些控制以实现确定性输出。

这可以解锁一致的完成,从而使得对构建在API之上的任何内容的模型行为具有完全控制,并且对于复制结果和测试非常有用,这样您就可以确切地知道会得到什么。

实现一致的输出

为了在API调用之间接收_大多数_确定性输出:

  • seed参数设置为您选择的任何整数,但在请求之间使用相同的值。例如,12345
  • 在请求之间将所有其他参数(prompt、temperature、top_p等)设置为相同的值。
  • 在响应中,检查system_fingerprint字段。系统指纹是当前模型权重、基础设施和OpenAI服务器用于生成完成的其他配置选项的组合的标识符。每当更改请求参数或OpenAI更新用于提供我们的模型的基础设施的数值配置时(可能每年发生几次),它都会更改。

如果在您的请求中seed、请求参数和system_fingerprint都匹配,那么模型输出将大多相同。即使请求参数和system_fingerprint匹配,由于我们模型固有的非确定性,也有很小的可能性响应会有所不同。

模型级别控制 - seedsystem_fingerprint

seed

如果指定了seed,我们的系统将尽最大努力进行确定性采样,这样使用相同的seed和参数重复请求应该返回相同的结果。并不保证确定性,您应该参考system_fingerprint响应参数来监控后端的变化。

system_fingerprint

这个指纹代表模型运行的后端配置。它可以与seed请求参数一起使用,以了解后端是否已经进行了可能影响确定性的更改。这是用户是否应该期望“几乎总是相同结果”的指标。

示例:使用固定种子生成短摘要

在这个示例中,我们将演示如何使用固定种子生成一个短摘要。这在需要为测试、调试或需要一致输出的应用程序生成一致结果的场景中特别有用。

Python SDK

注意 在撰写本文时,请切换到SDK的最新版本(1.3.3)。

!pip install --upgrade openai # 切换到最新版本的 OpenAI(撰写本文时为 1.3.3 版)

import openai
import asyncio
from IPython.display import display, HTML

from utils.embeddings_utils import (
get_embedding,
distances_from_embeddings
)

GPT_MODEL = "gpt-3.5-turbo-1106"

async def get_chat_response(
system_message: str, user_request: str, seed: int = None, temperature: float = 0.7
):
try:
messages = [
{"role": "system", "content": system_message},
{"role": "user", "content": user_request},
]

response = openai.chat.completions.create(
model=GPT_MODEL,
messages=messages,
seed=seed,
max_tokens=200,
temperature=temperature,
)

response_content = response.choices[0].message.content
system_fingerprint = response.system_fingerprint
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.total_tokens - response.usage.prompt_tokens

table = f"""
<table>
<tr><th>Response</th><td>{response_content}</td></tr>
<tr><th>System Fingerprint</th><td>{system_fingerprint}</td></tr>
<tr><th>Number of prompt tokens</th><td>{prompt_tokens}</td></tr>
<tr><th>Number of completion tokens</th><td>{completion_tokens}</td></tr>
</table>
"""
display(HTML(table))

return response_content
except Exception as e:
print(f"An error occurred: {e}")
return None

def calculate_average_distance(responses):
"""
此功能计算响应嵌入之间的平均距离。嵌入间的距离是衡量响应相似度的一种度量。
"""
# 为每个回答计算嵌入向量
response_embeddings = [get_embedding(response) for response in responses]

# 计算第一个响应与其他所有响应之间的距离
distances = distances_from_embeddings(response_embeddings[0], response_embeddings[1:])

# 计算平均距离
average_distance = sum(distances) / len(distances)

# 返回平均距离
return average_distance

首先,让我们尝试生成关于“火星之旅”的短篇摘录的几个不同版本,而不使用seed参数。这是默认行为:

topic = "a journey to Mars"
system_message = "You are a helpful assistant."
user_request = f"Generate a short excerpt of news about {topic}."

responses = []


async def get_response(i):
print(f'Output {i + 1}\n{"-" * 10}')
response = await get_chat_response(
system_message=system_message, user_request=user_request
)
return response


responses = await asyncio.gather(*[get_response(i) for i in range(5)])
average_distance = calculate_average_distance(responses)
print(f"The average similarity between responses is: {average_distance}")

Output 1
----------
Response "NASA's Mars mission reaches critical stage as spacecraft successfully enters orbit around the red planet. The historic journey, which began over a year ago, has captured the world's attention as scientists and astronauts prepare to land on Mars for the first time. The mission is expected to provide valuable insights into the planet's geology, atmosphere, and potential for sustaining human life in the future."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 76
Output 2
----------
Response "NASA's Perseverance rover successfully landed on Mars, marking a major milestone in the mission to explore the red planet. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples of rock and soil for future return to Earth. This historic achievement paves the way for further exploration and potential human missions to Mars in the near future."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 76
Output 3
----------
Response "SpaceX successfully launched the first manned mission to Mars yesterday, marking a historic milestone in space exploration. The crew of four astronauts will spend the next six months traveling to the red planet, where they will conduct groundbreaking research and experiments. This mission represents a significant step towards establishing a human presence on Mars and paves the way for future interplanetary travel."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 72
Output 4
----------
Response "NASA's latest Mars mission exceeds expectations as the Perseverance rover uncovers tantalizing clues about the Red Planet's past. Scientists are thrilled by the discovery of ancient riverbeds and sedimentary rocks, raising hopes of finding signs of past life on Mars. With this exciting progress, the dream of sending humans to Mars feels closer than ever before."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 72
Output 5
----------
Response "NASA's Perseverance Rover Successfully Lands on Mars, Begins Exploration Mission In a historic moment for space exploration, NASA's Perseverance rover has successfully landed on the surface of Mars. After a seven-month journey, the rover touched down in the Jezero Crater, a location scientists believe may have once held a lake and could potentially contain signs of ancient microbial life. The rover's primary mission is to search for evidence of past life on Mars and collect rock and soil samples for future return to Earth. Equipped with advanced scientific instruments, including cameras, spectrometers, and a drill, Perseverance will begin its exploration of the Martian surface, providing valuable data and insights into the planet's geology and potential habitability. This successful landing marks a significant milestone in humanity's quest to understand the red planet and paves the way for future manned missions to Mars. NASA's Perseverance rover is poised to unravel the mysteries of Mars and unlock new possibilities
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 200
The average similarity between responses is: 0.1136714512418833

现在,让我们尝试使用常数seed为123和temperature为0来运行相同的代码,并比较响应和system_fingerprint

SEED = 123
responses = []


async def get_response(i):
print(f'Output {i + 1}\n{"-" * 10}')
response = await get_chat_response(
system_message=system_message,
seed=SEED,
temperature=0,
user_request=user_request,
)
return response


responses = await asyncio.gather(*[get_response(i) for i in range(5)])

average_distance = calculate_average_distance(responses)
print(f"The average distance between responses is: {average_distance}")

Output 1
----------
Response "NASA's Perseverance Rover Successfully Lands on Mars In a historic achievement, NASA's Perseverance rover has successfully landed on the surface of Mars, marking a major milestone in the exploration of the red planet. The rover, which traveled over 293 million miles from Earth, is equipped with state-of-the-art instruments designed to search for signs of ancient microbial life and collect rock and soil samples for future return to Earth. This mission represents a significant step forward in our understanding of Mars and the potential for human exploration of the planet in the future."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 113
Output 2
----------
Response "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 81
Output 3
----------
Response "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as NASA continues to push the boundaries of space exploration."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 72
Output 4
----------
Response "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 81
Output 5
----------
Response "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel."
System Fingerprint fp_772e8125bb
Number of prompt tokens 29
Number of completion tokens 74
The average distance between responses is: 0.0449054397632461

正如我们所观察到的,seed参数允许我们生成更加一致的结果。

结论

我们演示了如何使用固定的整数seed来生成模型的一致输出。这在需要可重现性的场景中特别有用。然而,需要注意的是,虽然seed确保了一致性,但并不能保证输出的质量。请注意,当您想要使用可重现的输出时,需要在每次调用Chat Completions时将seed设置为相同的整数。您还应该匹配其他参数,如temperaturemax_tokens等。进一步扩展可重现输出的方法可能是在对不同提示或模型的性能进行基准测试/评估时使用一致的seed,以确保每个版本在相同条件下进行评估,使比较公平且结果可靠。