跳到主要内容

如何流式传输完成内容

nbviewer

默认情况下,当您从OpenAI请求完成时,整个完成内容会在发送回来之前生成在单个响应中。

如果您正在生成较长的完成内容,等待响应可能需要花费很多秒钟。

为了更快地获得响应,您可以在生成过程中“流式传输”完成内容。这样可以在完成内容完全生成之前开始打印或处理完成内容的开头部分。

要流式传输完成内容,在调用聊天完成或完成端点时设置 stream=True。这将返回一个对象,以 仅数据的服务器发送事件 的形式流式传输响应。从 delta 字段而不是 message 字段中提取块。

缺点

请注意,在生产应用程序中使用 stream=True 会使完成内容的内容更难以进行审核,因为部分完成内容可能更难评估。这可能会对批准的使用方式产生影响。

示例代码

下面,这个笔记本展示了: 1. 典型的聊天完成响应的样子 2. 流式传输聊天完成响应的样子 3. 通过流式传输聊天完成节省了多少时间 4. 如何获取用于流式传输聊天完成响应的令牌使用数据

# !pip install openai

# 导入
import time # 用于测量API调用时间持续性的工具
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

1. 典型的聊天完成响应是什么样的

通过典型的ChatCompletions API调用,响应首先被计算,然后一次性返回。

# OpenAI ChatCompletion请求示例
# https://platform.openai.com/docs/guides/text-generation/chat-completions-api

# 记录请求发送前的时间
start_time = time.time()

# 发送一个ChatCompletion请求,要求从1数到100。
response = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
],
temperature=0,
)
# 计算接收响应所需的时间
response_time = time.time() - start_time

# 打印接收到的延迟时间和文本
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full response received:\n{response}")

Full response received 5.27 seconds after request
Full response received:
ChatCompletion(id='chatcmpl-8ZB8ywkV5DuuJO7xktqUcNYfG8j6I', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.', role='assistant', function_call=None, tool_calls=None))], created=1703395008, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=299, prompt_tokens=36, total_tokens=335))

可以使用response.choices[0].message来提取回复。

可以使用response.choices[0].message.content来提取回复的内容。

reply = response.choices[0].message
print(f"Extracted reply: \n{reply}")

reply_content = response.choices[0].message.content
print(f"Extracted content: \n{reply_content}")

Extracted reply: 
ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.', role='assistant', function_call=None, tool_calls=None)
Extracted content:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.

2. 如何流式传输聊天完成

通过流式API调用,响应将以逐步增量的方式通过事件流的形式返回。在Python中,您可以使用for循环迭代这些事件。

让我们看看它是什么样子的:

# 使用 stream=True 参数的 OpenAI ChatCompletion 请求示例
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# 一次聊天完成请求
response = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': "What's 1+1? Answer in one word."}
],
temperature=0,
stream=True # 这次,我们设置 stream=True
)

for chunk in response:
print(chunk)
print(chunk.choices[0].delta.content)
print("****************")

ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None)

****************
ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content='2', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None)
2
****************
ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1703395058, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None)
None
****************

正如您在上面看到的,流式响应具有delta字段,而不是message字段。delta可以包含以下内容: - 一个角色令牌(例如,{"role": "assistant"}) - 一个内容令牌(例如,{"content": "\n\n"}) - 什么都没有(例如,{}),当流结束时

3. 通过流式传输聊天完成节省了多少时间

现在让我们再次让 gpt-3.5-turbo 从1数到100,看看需要多长时间。

# 使用 `stream=True` 参数的 OpenAI ChatCompletion 请求示例
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# 记录请求发送前的时间
start_time = time.time()

# 发送一个ChatCompletion请求,要求从1数到100。
response = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
],
temperature=0,
stream=True # 再次地,我们设定 stream=True。
)
# 创建变量以收集数据块流
collected_chunks = []
collected_messages = []
# 遍历事件流
for chunk in response:
chunk_time = time.time() - start_time # 计算数据块的时间延迟
collected_chunks.append(chunk) # 保存事件响应
chunk_message = chunk.choices[0].delta.content # 提取信息
collected_messages.append(chunk_message) # 保存消息
print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}") # 打印延迟和文本

# 打印接收到的延迟时间和文本
print(f"Full response received {chunk_time:.2f} seconds after request")
# 在收集到的消息中清除空值
collected_messages = [m for m in collected_messages if m is not None]
full_reply_content = ''.join(collected_messages)
print(f"Full conversation received: {full_reply_content}")

Message received 0.31 seconds after request: 
Message received 0.31 seconds after request: 1
Message received 0.34 seconds after request: ,
Message received 0.34 seconds after request:
Message received 0.34 seconds after request: 2
Message received 0.39 seconds after request: ,
Message received 0.39 seconds after request:
Message received 0.39 seconds after request: 3
Message received 0.42 seconds after request: ,
Message received 0.42 seconds after request:
Message received 0.42 seconds after request: 4
Message received 0.47 seconds after request: ,
Message received 0.47 seconds after request:
Message received 0.47 seconds after request: 5
Message received 0.51 seconds after request: ,
Message received 0.51 seconds after request:
Message received 0.51 seconds after request: 6
Message received 0.55 seconds after request: ,
Message received 0.55 seconds after request:
Message received 0.55 seconds after request: 7
Message received 0.59 seconds after request: ,
Message received 0.59 seconds after request:
Message received 0.59 seconds after request: 8
Message received 0.63 seconds after request: ,
Message received 0.63 seconds after request:
Message received 0.63 seconds after request: 9
Message received 0.67 seconds after request: ,
Message received 0.67 seconds after request:
Message received 0.67 seconds after request: 10
Message received 0.71 seconds after request: ,
Message received 0.71 seconds after request:
Message received 0.71 seconds after request: 11
Message received 0.75 seconds after request: ,
Message received 0.75 seconds after request:
Message received 0.75 seconds after request: 12
Message received 0.98 seconds after request: ,
Message received 0.98 seconds after request:
Message received 0.98 seconds after request: 13
Message received 1.02 seconds after request: ,
Message received 1.02 seconds after request:
Message received 1.02 seconds after request: 14
Message received 1.04 seconds after request: ,
Message received 1.04 seconds after request:
Message received 1.04 seconds after request: 15
Message received 1.08 seconds after request: ,
Message received 1.08 seconds after request:
Message received 1.08 seconds after request: 16
Message received 1.12 seconds after request: ,
Message received 1.12 seconds after request:
Message received 1.12 seconds after request: 17
Message received 1.16 seconds after request: ,
Message received 1.16 seconds after request:
Message received 1.16 seconds after request: 18
Message received 1.19 seconds after request: ,
Message received 1.19 seconds after request:
Message received 1.19 seconds after request: 19
Message received 1.23 seconds after request: ,
Message received 1.23 seconds after request:
Message received 1.23 seconds after request: 20
Message received 1.27 seconds after request: ,
Message received 1.27 seconds after request:
Message received 1.27 seconds after request: 21
Message received 1.31 seconds after request: ,
Message received 1.31 seconds after request:
Message received 1.31 seconds after request: 22
Message received 1.35 seconds after request: ,
Message received 1.35 seconds after request:
Message received 1.35 seconds after request: 23
Message received 1.39 seconds after request: ,
Message received 1.39 seconds after request:
Message received 1.39 seconds after request: 24
Message received 1.43 seconds after request: ,
Message received 1.43 seconds after request:
Message received 1.43 seconds after request: 25
Message received 1.47 seconds after request: ,
Message received 1.47 seconds after request:
Message received 1.47 seconds after request: 26
Message received 1.51 seconds after request: ,
Message received 1.51 seconds after request:
Message received 1.51 seconds after request: 27
Message received 1.55 seconds after request: ,
Message received 1.55 seconds after request:
Message received 1.55 seconds after request: 28
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:
Message received 1.59 seconds after request: 29
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:
Message received 1.59 seconds after request: 30
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:
Message received 1.59 seconds after request: 31
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:
Message received 1.60 seconds after request: 32
Message received 1.60 seconds after request: ,
Message received 1.60 seconds after request:
Message received 1.60 seconds after request: 33
Message received 1.60 seconds after request: ,
Message received 1.60 seconds after request:
Message received 1.67 seconds after request: 34
Message received 1.67 seconds after request: ,
Message received 1.67 seconds after request:
Message received 1.68 seconds after request: 35
Message received 1.68 seconds after request: ,
Message received 1.68 seconds after request:
Message received 1.86 seconds after request: 36
Message received 1.86 seconds after request: ,
Message received 1.86 seconds after request:
Message received 1.90 seconds after request: 37
Message received 1.90 seconds after request: ,
Message received 1.90 seconds after request:
Message received 1.94 seconds after request: 38
Message received 1.94 seconds after request: ,
Message received 1.94 seconds after request:
Message received 1.98 seconds after request: 39
Message received 1.98 seconds after request: ,
Message received 1.98 seconds after request:
Message received 2.05 seconds after request: 40
Message received 2.05 seconds after request: ,
Message received 2.05 seconds after request:
Message received 2.09 seconds after request: 41
Message received 2.09 seconds after request: ,
Message received 2.09 seconds after request:
Message received 2.14 seconds after request: 42
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:
Message received 2.14 seconds after request: 43
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:
Message received 2.14 seconds after request: 44
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:
Message received 2.14 seconds after request: 45
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:
Message received 2.15 seconds after request: 46
Message received 2.15 seconds after request: ,
Message received 2.15 seconds after request:
Message received 2.30 seconds after request: 47
Message received 2.30 seconds after request: ,
Message received 2.30 seconds after request:
Message received 2.30 seconds after request: 48
Message received 2.30 seconds after request: ,
Message received 2.30 seconds after request:
Message received 2.30 seconds after request: 49
Message received 2.30 seconds after request: ,
Message received 2.30 seconds after request:
Message received 2.31 seconds after request: 50
Message received 2.31 seconds after request: ,
Message received 2.31 seconds after request:
Message received 2.39 seconds after request: 51
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:
Message received 2.40 seconds after request: 52
Message received 2.40 seconds after request: ,
Message received 2.40 seconds after request:
Message received 2.48 seconds after request: 53
Message received 2.48 seconds after request: ,
Message received 2.48 seconds after request:
Message received 2.49 seconds after request: 54
Message received 2.49 seconds after request: ,
Message received 2.49 seconds after request:
Message received 2.68 seconds after request: 55
Message received 2.68 seconds after request: ,
Message received 2.68 seconds after request:
Message received 2.72 seconds after request: 56
Message received 2.72 seconds after request: ,
Message received 2.72 seconds after request:
Message received 2.77 seconds after request: 57
Message received 2.77 seconds after request: ,
Message received 2.77 seconds after request:
Message received 2.80 seconds after request: 58
Message received 2.80 seconds after request: ,
Message received 2.80 seconds after request:
Message received 2.85 seconds after request: 59
Message received 2.85 seconds after request: ,
Message received 2.85 seconds after request:
Message received 2.88 seconds after request: 60
Message received 2.88 seconds after request: ,
Message received 2.88 seconds after request:
Message received 2.88 seconds after request: 61
Message received 2.88 seconds after request: ,
Message received 2.88 seconds after request:
Message received 2.89 seconds after request: 62
Message received 2.89 seconds after request: ,
Message received 2.89 seconds after request:
Message received 2.89 seconds after request: 63
Message received 2.89 seconds after request: ,
Message received 2.89 seconds after request:
Message received 2.92 seconds after request: 64
Message received 2.92 seconds after request: ,
Message received 2.92 seconds after request:
Message received 3.37 seconds after request: 65
Message received 3.37 seconds after request: ,
Message received 3.37 seconds after request:
Message received 3.38 seconds after request: 66
Message received 3.38 seconds after request: ,
Message received 3.38 seconds after request:
Message received 3.38 seconds after request: 67
Message received 3.38 seconds after request: ,
Message received 3.38 seconds after request:
Message received 3.38 seconds after request: 68
Message received 3.38 seconds after request: ,
Message received 3.38 seconds after request:
Message received 3.42 seconds after request: 69
Message received 3.42 seconds after request: ,
Message received 3.42 seconds after request:
Message received 3.43 seconds after request: 70
Message received 3.43 seconds after request: ,
Message received 3.43 seconds after request:
Message received 3.46 seconds after request: 71
Message received 3.46 seconds after request: ,
Message received 3.46 seconds after request:
Message received 3.47 seconds after request: 72
Message received 3.47 seconds after request: ,
Message received 3.47 seconds after request:
Message received 3.50 seconds after request: 73
Message received 3.50 seconds after request: ,
Message received 3.50 seconds after request:
Message received 3.51 seconds after request: 74
Message received 3.51 seconds after request: ,
Message received 3.51 seconds after request:
Message received 3.52 seconds after request: 75
Message received 3.52 seconds after request: ,
Message received 3.52 seconds after request:
Message received 3.54 seconds after request: 76
Message received 3.54 seconds after request: ,
Message received 3.54 seconds after request:
Message received 3.56 seconds after request: 77
Message received 3.56 seconds after request: ,
Message received 3.56 seconds after request:
Message received 3.59 seconds after request: 78
Message received 3.59 seconds after request: ,
Message received 3.59 seconds after request:
Message received 3.59 seconds after request: 79
Message received 3.59 seconds after request: ,
Message received 3.59 seconds after request:
Message received 3.59 seconds after request: 80
Message received 3.59 seconds after request: ,
Message received 3.59 seconds after request:
Message received 3.61 seconds after request: 81
Message received 3.61 seconds after request: ,
Message received 3.61 seconds after request:
Message received 3.65 seconds after request: 82
Message received 3.65 seconds after request: ,
Message received 3.65 seconds after request:
Message received 3.85 seconds after request: 83
Message received 3.85 seconds after request: ,
Message received 3.85 seconds after request:
Message received 3.90 seconds after request: 84
Message received 3.90 seconds after request: ,
Message received 3.90 seconds after request:
Message received 3.95 seconds after request: 85
Message received 3.95 seconds after request: ,
Message received 3.95 seconds after request:
Message received 4.00 seconds after request: 86
Message received 4.00 seconds after request: ,
Message received 4.00 seconds after request:
Message received 4.04 seconds after request: 87
Message received 4.04 seconds after request: ,
Message received 4.04 seconds after request:
Message received 4.08 seconds after request: 88
Message received 4.08 seconds after request: ,
Message received 4.08 seconds after request:
Message received 4.12 seconds after request: 89
Message received 4.12 seconds after request: ,
Message received 4.12 seconds after request:
Message received 4.18 seconds after request: 90
Message received 4.18 seconds after request: ,
Message received 4.18 seconds after request:
Message received 4.18 seconds after request: 91
Message received 4.18 seconds after request: ,
Message received 4.18 seconds after request:
Message received 4.18 seconds after request: 92
Message received 4.18 seconds after request: ,
Message received 4.18 seconds after request:
Message received 4.19 seconds after request: 93
Message received 4.19 seconds after request: ,
Message received 4.19 seconds after request:
Message received 4.20 seconds after request: 94
Message received 4.20 seconds after request: ,
Message received 4.20 seconds after request:
Message received 4.23 seconds after request: 95
Message received 4.23 seconds after request: ,
Message received 4.23 seconds after request:
Message received 4.27 seconds after request: 96
Message received 4.27 seconds after request: ,
Message received 4.27 seconds after request:
Message received 4.39 seconds after request: 97
Message received 4.39 seconds after request: ,
Message received 4.39 seconds after request:
Message received 4.39 seconds after request: 98
Message received 4.39 seconds after request: ,
Message received 4.39 seconds after request:
Message received 4.41 seconds after request: 99
Message received 4.41 seconds after request: ,
Message received 4.41 seconds after request:
Message received 4.41 seconds after request: 100
Message received 4.41 seconds after request: .
Message received 4.41 seconds after request: None
Full response received 4.41 seconds after request
Full conversation received: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.

时间比较

在上面的示例中,这两个请求都大约花费了4到5秒的时间才完全完成。请求时间会根据负载和其他随机因素而变化。

然而,对于流式请求,我们在0.1秒后收到了第一个令牌,并且随后的令牌每隔约0.01-0.02秒到达一次。

4. 如何获取用于流式聊天完成响应的令牌使用数据

您可以通过设置 stream_options={"include_usage": True} 来获取流式响应的令牌使用统计数据。这样做会在最后一个块流式传输一个额外的块。您可以通过该块上的 usage 字段访问整个请求的使用数据。在设置 stream_options={"include_usage": True} 时需要注意几点: * 除了最后一个块外,所有块上的 usage 字段的值将为 null。 * 最后一个块上的 usage 字段包含整个请求的令牌使用统计信息。 * 最后一个块上的 choices 字段将始终是一个空数组 []

让我们看看如何使用第2节中的示例来演示它是如何工作的。

# Example of an OpenAI ChatCompletion request with stream=True and stream_options={"include_usage": True}

# 一次聊天完成请求
response = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': "What's 1+1? Answer in one word."}
],
temperature=0,
stream=True,
stream_options={"include_usage": True}, # 检索流响应的令牌使用情况
)

for chunk in response:
print(f"choices: {chunk.choices}\nusage: {chunk.usage}")
print("****************")

choices: [Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='2', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)]
usage: None
****************
choices: []
usage: CompletionUsage(completion_tokens=1, prompt_tokens=19, total_tokens=20)
****************