Skip to main content
Open In ColabOpen on GitHub

从StuffDocumentsChain迁移

StuffDocumentsChain 通过将文档连接成一个单一的上下文窗口来组合文档。这是一种简单而有效的策略,适用于问答、摘要和其他目的的文档组合。

create_stuff_documents_chain 是推荐的替代方案。它的功能与 StuffDocumentsChain 相同,但提供了更好的流处理和批处理功能支持。由于它是 LCEL 原语 的简单组合,因此也更容易扩展并集成到其他 LangChain 应用中。

下面我们将通过一个简单的例子来说明StuffDocumentsChaincreate_stuff_documents_chain,以便更好地理解。

首先加载一个聊天模型:

pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

示例

让我们通过一个例子来分析一组文档。我们首先生成一些简单的文档用于说明目的:

from langchain_core.documents import Document

documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
]
API Reference:Document

遗留

Details

下面我们展示了一个使用StuffDocumentsChain的实现。我们为摘要任务定义了提示模板,并为此实例化了一个LLMChain对象。我们定义了如何将文档格式化为提示,并确保各种提示中的键的一致性。

from langchain.chains import LLMChain, StuffDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
# The prompt here should take as an input variable the
# `document_variable_name`
prompt = ChatPromptTemplate.from_template("Summarize this content: {context}")

llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
)

我们现在可以调用我们的链:

result = chain.invoke(documents)
result["output_text"]
'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'
for chunk in chain.stream(documents):
print(chunk)
{'input_documents': [Document(metadata={'title': 'apple_book'}, page_content='Apples are red'), Document(metadata={'title': 'blueberry_book'}, page_content='Blueberries are blue'), Document(metadata={'title': 'banana_book'}, page_content='Bananas are yelow')], 'output_text': 'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'}

LCEL

Details

下面我们展示了一个使用 create_stuff_documents_chain 的实现:

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Summarize this content: {context}")
chain = create_stuff_documents_chain(llm, prompt)

调用链,我们得到与之前类似的结果:

result = chain.invoke({"context": documents})
result
'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'

请注意,此实现支持输出令牌的流式传输:

for chunk in chain.stream({"context": documents}):
print(chunk, end=" | ")
 | This |  content |  describes |  the |  colors |  of |  different |  fruits | : |  apples |  are |  red | , |  blue | berries |  are |  blue | , |  and |  bananas |  are |  yellow | . |  |

下一步

查看LCEL概念文档以获取更多背景信息。

查看这些操作指南以了解更多关于使用RAG进行问答任务的信息。

请参阅本教程以获取更多基于LLM的摘要策略。


这个页面有帮助吗?