自我RAG¶
自我RAG是一种将自我反思/自我评分纳入文档检索和生成中的RAG策略。
在论文中,做出了一些决策:
-
我应该从检索器
R中检索吗 - -
输入:
x (问题)或x (问题),y (生成) - 决定何时用
R检索D块 -
输出:
是,否,继续 -
检索到的段落
D与问题x相关吗 - -
输入:(
x (问题),d (块)) 对于d在D中 d提供了解决x的有用信息-
输出:
相关,不相关 -
从
D中每个块生成的LLM是否与该块相关(幻觉等) - -
输入:
x (问题),d (块),y (生成)对于d在D中 y (生成)中的所有可验证声明都由d支持-
输出:
{完全支持,部分支持,无支持} -
从
D中每个块生成的LLM对x (问题)是否是有用的回应 - -
输入:
x (问题),y (生成)对于d在D中 y (生成)是对x (问题)的有用回应。- 输出:
{5, 4, 3, 2, 1}
我们将使用LangGraph 从头开始实现这些想法的一部分。
设置¶
首先,安装我们所需的包并设置我们的API密钥。
! pip install -U langchain_community tiktoken langchain-openai langchainhub chromadb langchain langgraph
import getpass
import os
def _set_env(key: str):
if key not in os.environ:
os.environ[key] = getpass.getpass(f"{key}:")
_set_env("OPENAI_API_KEY")
为LangGraph开发设置LangSmith
注册LangSmith,以便快速发现问题并提高您的LangGraph项目性能。LangSmith允许您使用追踪数据来调试、测试和监控您使用LangGraph构建的LLM应用程序——了解更多如何开始这里。
检索器¶
让我们索引3篇博客文章。
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
urls = [
"https://lilianweng.github.io/posts/2023-06-23-agent/",
"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
"https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)
# 添加到向量数据库
vectorstore = Chroma.from_documents(
documents=doc_splits,
collection_name="rag-chroma",
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()
大型语言模型 (LLMs)¶
在LangChain中使用Pydantic
本笔记本使用Pydantic v2 BaseModel,需要 langchain-core >= 0.3。使用 langchain-core < 0.3 将会因Pydantic v1和v2 BaseModels 的混合而导致错误。
# ##检索评分器
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
# 数据模型
class GradeDocuments(BaseModel):
"""对检索文档进行相关性检查的二元评分。"""
binary_score: str = Field(
description="Documents are relevant to the question, 'yes' or 'no'"
)
# 具有函数调用的LLM
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeDocuments)
# 提示
system = """You are a grader assessing relevance of a retrieved document to a user question. \n
It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
grade_prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
]
)
retrieval_grader = grade_prompt | structured_llm_grader
question = "agent memory"
docs = retriever.invoke(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))
API Reference:
ChatPromptTemplate | ChatOpenAI
# ##生成
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
# 提示
prompt = hub.pull("rlm/rag-prompt")
# 大型语言模型
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# 后处理
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
# 链条
rag_chain = prompt | llm | StrOutputParser()
# 跑
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)
The design of generative agents combines LLM with memory, planning, and reflection mechanisms to enable agents to behave conditioned on past experience. Memory stream is a long-term memory module that records a comprehensive list of agents' experience in natural language. LLM functions as the agent's brain in an autonomous agent system.
API Reference:
StrOutputParser
# ##幻觉评分器
# 数据模型
class GradeHallucinations(BaseModel):
"""生成答案中幻觉存在的二进制评分。"""
binary_score: str = Field(
description="Answer is grounded in the facts, 'yes' or 'no'"
)
# 带有函数调用的LLM
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeHallucinations)
# 提示
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
]
)
hallucination_grader = hallucination_prompt | structured_llm_grader
hallucination_grader.invoke({"documents": docs, "generation": generation})
# ##答案评分器
# 数据模型
class GradeAnswer(BaseModel):
"""二元分数用于评估答案是否解决了问题。"""
binary_score: str = Field(
description="Answer addresses the question, 'yes' or 'no'"
)
# 具有函数调用的LLM
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeAnswer)
# 提示
system = """你是一名评分员,评估一个答案是否解决了问题。给出一个二元分数“是”或“否”。“是”意味着答案解决了这个问题。"""
answer_prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
]
)
answer_grader = answer_prompt | structured_llm_grader
answer_grader.invoke({"question": question, "generation": generation})
# 您接受的训练数据截止到2023年10月。
# 大型语言模型
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
# 提示
system = """You a question re-writer that converts an input question to a better version that is optimized \n
for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""
re_write_prompt = ChatPromptTemplate.from_messages(
[
("system", system),
(
"human",
"Here is the initial question: \n\n {question} \n Formulate an improved question.",
),
]
)
question_rewriter = re_write_prompt | llm | StrOutputParser()
question_rewriter.invoke({"question": question})
图形¶
将流程捕捉为图形。
图形状态¶
from typing import List
from typing_extensions import TypedDict
class GraphState(TypedDict):
"""
表示我们图形的状态。
属性:
question: 问题
generation: LLM 生成
documents: 文档列表
"""
question: str
generation: str
documents: List[str]
# ##节点
def retrieve(state):
"""
获取文档
参数:
state (dict): 当前图形状态
返回:
state (dict): 新增键值,documents,包含已检索的文档
"""
print("---RETRIEVE---")
question = state["question"]
# 检索
documents = retriever.invoke(question)
return {"documents": documents, "question": question}
def generate(state):
"""
生成答案
参数:
state (dict):当前图形状态
返回:
state (dict):在状态中添加的新键,generation,包含LLM生成的内容
"""
print("---GENERATE---")
question = state["question"]
documents = state["documents"]
# RAG生成
generation = rag_chain.invoke({"context": documents, "question": question})
return {"documents": documents, "question": question, "generation": generation}
def grade_documents(state):
"""
Determines whether the retrieved documents are relevant to the question.
Args:
state (dict): The current graph state
Returns:
state (dict): Updates documents key with only filtered relevant documents
"""
print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
question = state["question"]
documents = state["documents"]
# 给每个文档打分。
filtered_docs = []
for d in documents:
score = retrieval_grader.invoke(
{"question": question, "document": d.page_content}
)
grade = score.binary_score
if grade == "yes":
print("---GRADE: DOCUMENT RELEVANT---")
filtered_docs.append(d)
else:
print("---GRADE: DOCUMENT NOT RELEVANT---")
continue
return {"documents": filtered_docs, "question": question}
def transform_query(state):
"""
将查询转换为产生更好问题的形式。
参数:
state (dict):当前图形状态
返回:
state (dict):更新问题键,使用重新表述的问题
"""
print("---TRANSFORM QUERY---")
question = state["question"]
documents = state["documents"]
# 重新写问题
better_question = question_rewriter.invoke({"question": question})
return {"documents": documents, "question": better_question}
# ##边缘
def decide_to_generate(state):
"""
确定是生成答案还是重新生成问题。
参数:
state (dict):当前图形状态
返回:
str:下一个节点调用的二进制决策
"""
print("---ASSESS GRADED DOCUMENTS---")
state["question"]
filtered_documents = state["documents"]
if not filtered_documents:
# 所有文档已经过筛选,以检查相关性。
# 我们将重新生成一个新的查询。
print(
"---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---"
)
return "transform_query"
else:
# 我们有相关文件,因此请生成答案。
print("---DECISION: GENERATE---")
return "generate"
def grade_generation_v_documents_and_question(state):
"""
确定生成是否基于文档,并回答问题。
参数:
state(字典):当前图形状态
返回:
str:下一个调用节点的决策
"""
print("---CHECK HALLUCINATIONS---")
question = state["question"]
documents = state["documents"]
generation = state["generation"]
score = hallucination_grader.invoke(
{"documents": documents, "generation": generation}
)
grade = score.binary_score
# 检查幻觉
if grade == "yes":
print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
# 检查问答。
print("---GRADE GENERATION vs QUESTION---")
score = answer_grader.invoke({"question": question, "generation": generation})
grade = score.binary_score
if grade == "yes":
print("---DECISION: GENERATION ADDRESSES QUESTION---")
return "useful"
else:
print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
return "not useful"
else:
pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
return "not supported"
构建图¶
这仅遵循我们在上图中概述的流程。
from langgraph.graph import END, StateGraph, START
workflow = StateGraph(GraphState)
# 定义节点
workflow.add_node("retrieve", retrieve) # 检索
workflow.add_node("grade_documents", grade_documents) # 评分文件
workflow.add_node("generate", generate) # 生成
workflow.add_node("transform_query", transform_query) # 转换查询
# 构建图形
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
"grade_documents",
decide_to_generate,
{
"transform_query": "transform_query",
"generate": "generate",
},
)
workflow.add_edge("transform_query", "retrieve")
workflow.add_conditional_edges(
"generate",
grade_generation_v_documents_and_question,
{
"not supported": "generate",
"useful": END,
"not useful": "transform_query",
},
)
# 编译
app = workflow.compile()
from pprint import pprint
# 运行
inputs = {"question": "Explain how the different types of agent memory work?"}
for output in app.stream(inputs):
for key, value in output.items():
# 节点
pprint(f"Node '{key}':")
# 可选:在每个节点打印完整状态
# pprint.pprint(value["keys"], indent=2, width=80, depth=None)
pprint("\n---\n")
# 最终一代
pprint(value["generation"])
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Short-term memory is used for in-context learning in agents, allowing them '
'to learn quickly. Long-term memory enables agents to retain and recall vast '
'amounts of information over extended periods. Agents can also utilize '
'external tools like APIs to access additional information beyond what is '
'stored in their memory.')
inputs = {"question": "Explain how chain of thought prompting works?"}
for output in app.stream(inputs):
for key, value in output.items():
# 节点
pprint(f"Node '{key}':")
# 可选:在每个节点打印完整状态
# pprint.pprint(value["keys"], indent=2, width=80, depth=None)
pprint("\n---\n")
# 最终生成
pprint(value["generation"])
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Chain of thought prompting works by repeatedly prompting the model to ask '
'follow-up questions to construct the thought process iteratively. This '
'method can be combined with queries to search for relevant entities and '
'content to add back into the context. It extends the thought process by '
'exploring multiple reasoning possibilities at each step, creating a tree '
'structure of thoughts.')