如何修剪消息
所有模型都有有限的上下文窗口,这意味着它们可以接受的tokens数量是有限的。如果你有非常长的消息或一个累积了长消息历史的链/代理,你需要管理传递给模型的消息长度。
trim_messages 可以用于将聊天记录的大小减少到指定的令牌数量或指定的消息数量。
如果将修剪后的聊天历史直接传递回聊天模型,修剪后的聊天历史应满足以下属性:
- 生成的聊天记录应该是有效的。通常这意味着应满足以下属性:
- 聊天记录开始于(1)一个
HumanMessage
或(2)一个SystemMessage后跟一个HumanMessage
。 - 聊天记录结束于一个
HumanMessage
或一个ToolMessage
。 - 一个
ToolMessage
只能出现在涉及工具调用的AIMessage
之后。 这可以通过设置start_on="human"
和ends_on=("human", "tool")
来实现。
- 聊天记录开始于(1)一个
- 它包括最近的聊天记录并丢弃旧的聊天记录。
这可以通过设置
strategy="last"
来实现。 - 通常,新的聊天历史记录应该包括
SystemMessage
,如果它在原始聊天历史记录中存在,因为SystemMessage
包含对聊天模型的特殊指令。SystemMessage
几乎总是历史记录中的第一条消息(如果存在)。这可以通过设置include_system=True
来实现。
基于令牌计数的修剪
在这里,我们将根据令牌数量修剪聊天记录。修剪后的聊天记录将生成一个有效的聊天记录,其中包括SystemMessage
。
为了保留最新的消息,我们设置了strategy="last"
。我们还将设置include_system=True
以包含SystemMessage
,并设置start_on="human"
以确保生成的聊天历史记录是有效的。
这是一个基于trim_messages
根据令牌计数的良好默认配置。记得根据你的使用情况调整token_counter
和max_tokens
。
请注意,对于我们的token_counter
,我们可以传入一个函数(更多内容见下文)或一个语言模型(因为语言模型具有消息令牌计数方法)。当你正在修剪你的消息以适应特定模型的上下文窗口时,传入一个模型是有意义的:
pip install -qU langchain-openai
Note: you may need to restart the kernel to use updated packages.
from langchain_core.messages import (
AIMessage,
HumanMessage,
SystemMessage,
ToolMessage,
trim_messages,
)
from langchain_openai import ChatOpenAI
messages = [
SystemMessage("you're a good assistant, you always respond with a joke."),
HumanMessage("i wonder why it's called langchain"),
AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
HumanMessage("and who is harrison chasing anyways"),
AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
HumanMessage("what do you call a speechless parrot"),
]
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# Remember to adjust based on your model
# or else pass a custom token_encoder
token_counter=ChatOpenAI(model="gpt-4o"),
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# Remember to adjust based on the desired conversation
# length
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
allow_partial=False,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
基于消息计数的修剪
或者,我们可以根据消息数量来修剪聊天记录,通过设置token_counter=len
。在这种情况下,每条消息将计为一个单独的令牌,max_tokens
将控制消息的最大数量。
这是一个基于消息计数的trim_messages
的良好默认配置。记得根据你的使用情况调整max_tokens
。
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
token_counter=len,
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=5,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
高级用法
你可以使用trim_message
作为构建块来创建更复杂的处理逻辑。
如果我们想允许拆分消息内容,我们可以指定allow_partial=True
:
trim_messages(
messages,
max_tokens=56,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o"),
include_system=True,
allow_partial=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
AIMessage(content="\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
默认情况下,SystemMessage
不会被包含,因此您可以通过设置include_system=False
或删除include_system
参数来将其删除。
trim_messages(
messages,
max_tokens=45,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
我们可以通过指定strategy="first"
来执行获取第一个max_tokens
的反转操作:
trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]
编写自定义令牌计数器
我们可以编写一个自定义的令牌计数器函数,该函数接收消息列表并返回一个整数。
pip install -qU tiktoken
Note: you may need to restart the kernel to use updated packages.
from typing import List
import tiktoken
from langchain_core.messages import BaseMessage, ToolMessage
def str_token_counter(text: str) -> int:
enc = tiktoken.get_encoding("o200k_base")
return len(enc.encode(text))
def tiktoken_counter(messages: List[BaseMessage]) -> int:
"""Approximately reproduce https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
For simplicity only supports str Message.contents.
"""
num_tokens = 3 # every reply is primed with <|start|>assistant<|message|>
tokens_per_message = 3
tokens_per_name = 1
for msg in messages:
if isinstance(msg, HumanMessage):
role = "user"
elif isinstance(msg, AIMessage):
role = "assistant"
elif isinstance(msg, ToolMessage):
role = "tool"
elif isinstance(msg, SystemMessage):
role = "system"
else:
raise ValueError(f"Unsupported messages type {msg.__class__}")
num_tokens += (
tokens_per_message
+ str_token_counter(role)
+ str_token_counter(msg.content)
)
if msg.name:
num_tokens += tokens_per_name + str_token_counter(msg.name)
return num_tokens
trim_messages(
messages,
token_counter=tiktoken_counter,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
链式操作
trim_messages
可以命令式地使用(如上所述)或声明式地使用,使其易于与链中的其他组件组合
llm = ChatOpenAI(model="gpt-4o")
# Notice we don't pass in messages. This creates
# a RunnableLambda that takes messages as input
trimmer = trim_messages(
token_counter=llm,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
chain = trimmer | llm
chain.invoke(messages)
AIMessage(content='A polygon! Because it\'s a "poly-gone" quiet!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 32, 'total_tokens': 45, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_057232b607', 'finish_reason': 'stop', 'logprobs': None}, id='run-4fa026e7-9137-4fef-b596-54243615e3b3-0', usage_metadata={'input_tokens': 32, 'output_tokens': 13, 'total_tokens': 45})
查看LangSmith跟踪,我们可以看到在消息传递给模型之前,它们首先被修剪:https://smith.langchain.com/public/65af12c4-c24d-4824-90f0-6547566e59bb/r
只看修剪器,我们可以看到它是一个可运行的对象,可以像所有可运行对象一样被调用:
trimmer.invoke(messages)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
与ChatMessageHistory一起使用
修剪消息在处理聊天历史记录时特别有用,因为聊天历史记录可能会变得任意长:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
chat_history = InMemoryChatMessageHistory(messages=messages[:-1])
def dummy_get_session_history(session_id):
if session_id != "1":
return InMemoryChatMessageHistory()
return chat_history
llm = ChatOpenAI(model="gpt-4o")
trimmer = trim_messages(
max_tokens=45,
strategy="last",
token_counter=llm,
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# start_on="human" makes sure we produce a valid chat history
start_on="human",
)
chain = trimmer | llm
chain_with_history = RunnableWithMessageHistory(chain, dummy_get_session_history)
chain_with_history.invoke(
[HumanMessage("what do you call a speechless parrot")],
config={"configurable": {"session_id": "1"}},
)
AIMessage(content='A "polygon"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_c17d3befe7', 'finish_reason': 'stop', 'logprobs': None}, id='run-71d9fce6-bb0c-4bb3-acc8-d5eaee6ae7bc-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})
查看LangSmith跟踪,我们可以看到我们检索了所有的消息,但在消息传递给模型之前,它们被修剪为仅包含系统消息和最后一条人类消息:https://smith.langchain.com/public/17dd700b-9994-44ca-930c-116e00997315/r
API 参考
有关所有参数的完整描述,请参阅API参考:https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html