Skip to main content
Open In ColabOpen on GitHub

如何为聊天机器人添加记忆

聊天机器人的一个关键特性是它们能够使用之前对话轮次的内容作为上下文。这种状态管理可以采取多种形式,包括:

  • 简单地将之前的消息塞入聊天模型提示中。
  • 上述内容,但修剪旧消息以减少模型需要处理的分散注意力的信息量。
  • 更复杂的修改,如为长时间运行的对话合成摘要。

我们将在下面详细介绍一些技术!

note

本操作指南之前使用RunnableWithMessageHistory构建了一个聊天机器人。您可以在v0.2文档中访问此版本的指南。

自LangChain的v0.3版本发布以来,我们建议LangChain用户利用LangGraph持久化memory集成到新的LangChain应用中。

如果你的代码已经依赖于RunnableWithMessageHistoryBaseChatMessageHistory,你不需要做任何更改。我们不打算在不久的将来弃用此功能,因为它适用于简单的聊天应用程序,并且任何使用RunnableWithMessageHistory的代码将继续按预期工作。

请参阅如何迁移到LangGraph Memory了解更多详情。

设置

你需要安装一些包,并将你的OpenAI API密钥设置为名为OPENAI_API_KEY的环境变量:

%pip install --upgrade --quiet langchain langchain-openai langgraph

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········

我们还设置了一个聊天模型,我们将用于以下示例。

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")
API Reference:ChatOpenAI

消息传递

最简单的记忆形式就是简单地将聊天历史消息传递到一个链中。这里有一个例子:

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a helpful assistant. Answer all questions to the best of your ability."
),
MessagesPlaceholder(variable_name="messages"),
]
)

chain = prompt | model

ai_msg = chain.invoke(
{
"messages": [
HumanMessage(
content="Translate from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
],
}
)
print(ai_msg.content)
I said, "I love programming" in French: "J'adore la programmation."

我们可以看到,通过将之前的对话传递到一个链中,它可以将其用作上下文来回答问题。这是支撑聊天机器人记忆的基本概念 - 本指南的其余部分将展示传递或重新格式化消息的便捷技术。

自动历史管理

前面的示例显式地将消息传递给链(和模型)。这是一种完全可以接受的方法,但它确实需要外部管理新消息。LangChain 还提供了一种使用 LangGraph 的 持久性 构建具有记忆功能的应用程序的方法。您可以通过在编译图时提供 checkpointer启用持久性

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + state["messages"]
response = model.invoke(messages)
return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
API Reference:MemorySaver | StateGraph

我们将在此处传递最新的输入到对话中,并让LangGraph使用检查点来跟踪对话历史:

app.invoke(
{"messages": [HumanMessage(content="Translate to French: I love programming.")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39})]}
app.invoke(
{"messages": [HumanMessage(content="What did I just ask you?")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39}),
HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='c667529b-7c41-4cc0-9326-0af47328b816'),
AIMessage(content='You asked me to translate "I love programming" into French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 54, 'total_tokens': 67, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-134a7ea0-d3a4-4923-bd58-25e5a43f6a1f-0', usage_metadata={'input_tokens': 54, 'output_tokens': 13, 'total_tokens': 67})]}

修改聊天记录

修改存储的聊天消息可以帮助您的聊天机器人处理各种情况。以下是一些示例:

修剪消息

LLMs 和聊天模型的上下文窗口有限,即使你没有直接达到限制,你可能也希望限制模型需要处理的干扰量。一个解决方案是在将历史消息传递给模型之前对其进行修剪。让我们使用上面声明的 app 来举一个历史消息的例子:

demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What's my name?")]
},
config={"configurable": {"thread_id": "2"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}, id='c933eca3-5fd8-4651-af16-20fe2d49c216'),
AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-a0b21acc-9dbb-4fb6-a953-392020f37d88-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}

我们可以看到应用程序记住了预加载的名称。

但是假设我们有一个非常小的上下文窗口,我们希望将传递给模型的消息数量修剪为仅最近的2条。我们可以使用内置的trim_messages工具来在消息到达我们的提示之前根据它们的令牌计数修剪消息。在这种情况下,我们将每条消息计为1个“令牌”,并仅保留最后两条消息:

from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
trimmed_messages = trimmer.invoke(state["messages"])
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + trimmed_messages
response = model.invoke(messages)
return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

让我们调用这个新应用并检查响应

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What is my name?")]
},
config={"configurable": {"thread_id": "3"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='a22ab7c5-8617-4821-b3e9-a9e7dca1ff78'),
AIMessage(content="I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 39, 'total_tokens': 66, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-f7b32d72-9f57-4705-be7e-43bf1c3d293b-0', usage_metadata={'input_tokens': 39, 'output_tokens': 27, 'total_tokens': 66})]}

我们可以看到trim_messages被调用了,只有最近的两条消息会被传递给模型。在这种情况下,这意味着模型忘记了我们给它的名字。

查看我们的如何修剪消息的指南以获取更多信息。

摘要内存

我们也可以在其他方面使用相同的模式。例如,我们可以在调用我们的应用程序之前使用额外的LLM调用来生成对话的摘要。让我们重新创建我们的聊天记录:

demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]

现在,让我们更新模型调用函数,将之前的交互提炼成摘要:

from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability. "
"The provided chat history includes a summary of the earlier conversation."
)
system_message = SystemMessage(content=system_prompt)
message_history = state["messages"][:-1] # exclude the most recent user input
# Summarize the messages if the chat history reaches a certain size
if len(message_history) >= 4:
last_human_message = state["messages"][-1]
# Invoke the model to generate conversation summary
summary_prompt = (
"Distill the above chat messages into a single summary message. "
"Include as many specific details as you can."
)
summary_message = model.invoke(
message_history + [HumanMessage(content=summary_prompt)]
)

# Delete messages that we no longer want to show up
delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
# Re-add user message
human_message = HumanMessage(content=last_human_message.content)
# Call the model with summary & response
response = model.invoke([system_message, summary_message, human_message])
message_updates = [summary_message, human_message, response] + delete_messages
else:
message_updates = model.invoke([system_message] + state["messages"])

return {"messages": message_updates}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

让我们看看它是否记得我们给它的名字:

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage("What did I say my name was?")]
},
config={"configurable": {"thread_id": "4"}},
)
{'messages': [AIMessage(content="Nemo greeted me, and I responded positively, indicating that I'm doing well.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 60, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ee42f98d-907d-4bad-8f16-af2db789701d-0', usage_metadata={'input_tokens': 60, 'output_tokens': 16, 'total_tokens': 76}),
HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='788555ea-5b1f-4c29-a2f2-a92f15d147be'),
AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 67, 'total_tokens': 75, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-099a43bd-a284-4969-bb6f-0be486614cd8-0', usage_metadata={'input_tokens': 67, 'output_tokens': 8, 'total_tokens': 75})]}

请注意,再次调用应用程序将继续累积历史记录,直到达到指定的消息数量(在我们的例子中是四条)。此时,我们将从初始摘要加上新消息生成另一个摘要,依此类推。


这个页面有帮助吗?