为RAG引擎定制提示¶

在这个笔记本中，我们展示了各种提示技术，您可以尝试定制您的LlamaIndex RAG引擎流程。

获取和设置查询引擎的提示等。
定义模板变量映射（例如，您有一个现有的QA提示）
添加少样本示例 + 执行查询转换/重写。

In [ ]:

Copied!

%pip install llama-index-llms-openai
%pip install llama-index-readers-file pymupdf
%pip install llama-index-llms-openai
%pip install llama-index-readers-file pymupdf

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

import os
import openai
import os
import openai

In [ ]:

Copied!

os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]
os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]

设置¶

In [ ]:

Copied!





import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex
from llama_index.core import PromptTemplate
from IPython.display import Markdown, display
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex
from llama_index.core import PromptTemplate
from IPython.display import Markdown, display

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.

加载数据¶

In [ ]:

Copied!

!mkdir data
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
!mkdir data
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

mkdir: data: File exists
--2023-10-28 23:19:38--  https://arxiv.org/pdf/2307.09288.pdf
Resolving arxiv.org (arxiv.org)... 128.84.21.199
Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘data/llama2.pdf’

data/llama2.pdf     100%[===================>]  13.03M  1.50MB/s    in 10s     

2023-10-28 23:19:49 (1.31 MB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]

In [ ]:

Copied!

from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader

In [ ]:

Copied!

loader = PyMuPDFReader()
documents = loader.load(file_path="./data/llama2.pdf")
loader = PyMuPDFReader()
documents = loader.load(file_path="./data/llama2.pdf")

读取到向量库¶

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI

gpt35_llm = OpenAI(model="gpt-3.5-turbo")
gpt4_llm = OpenAI(model="gpt-4")

index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI

gpt35_llm = OpenAI(model="gpt-3.5-turbo")
gpt4_llm = OpenAI(model="gpt-4")

index = VectorStoreIndex.from_documents(documents)

设置查询引擎 / 检索器¶

In [ ]:

Copied!

query_str = "What are the potential risks associated with the use of Llama 2 as mentioned in the context?"
query_str = "What are the potential risks associated with the use of Llama 2 as mentioned in the context?"

In [ ]:

Copied!

query_engine = index.as_query_engine(similarity_top_k=2, llm=gpt35_llm)
# 用于测试
vector_retriever = index.as_retriever(similarity_top_k=2)
query_engine = index.as_query_engine(similarity_top_k=2, llm=gpt35_llm)
# 用于测试
vector_retriever = index.as_retriever(similarity_top_k=2)

In [ ]:

Copied!

response = query_engine.query(query_str)
print(str(response))
response = query_engine.query(query_str)
print(str(response))

The potential risks associated with the use of Llama 2, as mentioned in the context, include the generation of misinformation and the retrieval of information about topics such as bioterrorism or cybercrime. The models have been tuned to avoid these topics and diminish any capabilities they might have offered for those use cases. However, there is a possibility that the safety tuning of the models may go too far, resulting in an overly cautious approach where the model declines certain requests or responds with too many safety details. Users of Llama 2 and Llama 2-Chat need to be cautious and take extra steps in tuning and deployment to ensure responsible use.

查看/自定义提示¶

首先，让我们来看一下查询引擎的提示，并了解如何自定义它。

查看提示¶

In [ ]:

Copied!





# 定义提示查看函数
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**提示键**：{k}<br>" f"**文本：**<br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))
# 定义提示查看函数
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**提示键**：{k}
" f"**文本：**
"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("

"))

In [ ]:

Copied!

prompts_dict = query_engine.get_prompts()
prompts_dict = query_engine.get_prompts()

In [ ]:

Copied!

display_prompt_dict(prompts_dict)
display_prompt_dict(prompts_dict)

Prompt Key: response_synthesizer:text_qa_template
Text:

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:

Prompt Key: response_synthesizer:refine_template
Text:

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer:

自定义提示¶

如果我们想要做一些与标准问答提示不同的事情怎么办？

让我们尝试一下来自LangchainHub的RAG提示。

In [ ]:

Copied!

# 要做到这一点，您需要使用langchain对象

from langchain import hub

langchain_prompt = hub.pull("rlm/rag-prompt")
# 要做到这一点，您需要使用langchain对象

from langchain import hub

langchain_prompt = hub.pull("rlm/rag-prompt")

一个问题是提示中的模板变量与查询引擎中的合成器所期望的不同：

提示中使用 context 和 question,
我们期望 context_str 和 query_str

这不是问题！让我们将我们的模板变量映射添加到映射变量中。我们使用我们的 LangchainPromptTemplate 来映射到 LangChain 提示。

In [ ]:

Copied!





from llama_index.core.prompts import LangchainPromptTemplate

lc_prompt_tmpl = LangchainPromptTemplate(
    template=langchain_prompt,
    template_var_mappings={"query_str": "question", "context_str": "context"},
)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": lc_prompt_tmpl}
)
from llama_index.core.prompts import LangchainPromptTemplate

lc_prompt_tmpl = LangchainPromptTemplate(
    template=langchain_prompt,
    template_var_mappings={"query_str": "question", "context_str": "context"},
)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": lc_prompt_tmpl}
)

In [ ]:

Copied!

prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

Prompt Key: response_synthesizer:text_qa_template
Text:

input_variables=['question', 'context'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question', 'context'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]

Prompt Key: response_synthesizer:refine_template
Text:

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer:

试一试¶

让我们再次运行我们的查询引擎。

In [ ]:

Copied!

response = query_engine.query(query_str)
print(str(response))
response = query_engine.query(query_str)
print(str(response))

The potential risks associated with the use of Llama 2 mentioned in the context include the generation of misinformation, retrieval of information about topics like bioterrorism or cybercrime, an overly cautious approach by the model, and the need for users to be cautious and take extra steps in tuning and deployment. However, efforts have been made to tune the models to avoid these topics and diminish any capabilities they might have offered for those use cases.

添加少样本示例¶

让我们尝试向提示中添加少样本示例，这些示例可以根据查询动态加载！

我们可以通过在提示模板中设置 function_mapping 变量来实现这一点 - 这允许我们在格式化提示时计算函数（例如返回少样本示例）。

作为一个示例用例，通过这种方式，我们可以迫使模型以结构化格式输出结果，通过展示其他结构化输出的示例。

让我们解析一个预生成的问题/答案文件。为了简化，我们将跳过文件是如何生成的部分（简而言之，我们使用了一个由GPT-4提供支持的调用RAG pipeline的函数），但qa对看起来是这样的：

{"query": "<query>", "response": "<output_json>"}

我们嵌入/索引这些问答对，并检索前k个。

In [ ]:

Copied!





from llama_index.core.schema import TextNode

few_shot_nodes = []
for line in open("../llama2_qa_citation_events.jsonl", "r"):
    few_shot_nodes.append(TextNode(text=line))

few_shot_index = VectorStoreIndex(few_shot_nodes)
few_shot_retriever = few_shot_index.as_retriever(similarity_top_k=2)
from llama_index.core.schema import TextNode

few_shot_nodes = []
for line in open("../llama2_qa_citation_events.jsonl", "r"):
    few_shot_nodes.append(TextNode(text=line))

few_shot_index = VectorStoreIndex(few_shot_nodes)
few_shot_retriever = few_shot_index.as_retriever(similarity_top_k=2)

In [ ]:

Copied!





import json


def few_shot_examples_fn(**kwargs):
    query_str = kwargs["query_str"]
    retrieved_nodes = few_shot_retriever.retrieve(query_str)
    # 遍历每个节点，获取json对象

    result_strs = []
    for n in retrieved_nodes:
        raw_dict = json.loads(n.get_content())
        query = raw_dict["query"]
        response_dict = json.loads(raw_dict["response"])
        result_str = f"""\
Query: {query}
Response: {response_dict}"""
        result_strs.append(result_str)
    return "\n\n".join(result_strs)
import json


def few_shot_examples_fn(**kwargs):
    query_str = kwargs["query_str"]
    retrieved_nodes = few_shot_retriever.retrieve(query_str)
    # 遍历每个节点，获取json对象

    result_strs = []
    for n in retrieved_nodes:
        raw_dict = json.loads(n.get_content())
        query = raw_dict["query"]
        response_dict = json.loads(raw_dict["response"])
        result_str = f"""\
Query: {query}
Response: {response_dict}"""
        result_strs.append(result_str)
    return "\n\n".join(result_strs)

In [ ]:

Copied!





# 编写带有函数的提示模板
qa_prompt_tmpl_str = """\
下面是上下文信息。
---------------------
{context_str}
---------------------
根据上下文信息和非先验知识，回答有关不同主题引用的查询。
请以结构化的JSON格式提供您的答案，其中包含作者列表作为引用。以下是一些示例。

{few_shot_examples}

查询：{query_str}
答案：\
"""

qa_prompt_tmpl = PromptTemplate(
    qa_prompt_tmpl_str,
    function_mappings={"few_shot_examples": few_shot_examples_fn},
)
# 编写带有函数的提示模板
qa_prompt_tmpl_str = """\
下面是上下文信息。
---------------------
{context_str}
---------------------
根据上下文信息和非先验知识，回答有关不同主题引用的查询。
请以结构化的JSON格式提供您的答案，其中包含作者列表作为引用。以下是一些示例。

{few_shot_examples}

查询：{query_str}
答案：\
"""

qa_prompt_tmpl = PromptTemplate(
    qa_prompt_tmpl_str,
    function_mappings={"few_shot_examples": few_shot_examples_fn},
)

In [ ]:

Copied!

citation_query_str = (
    "Which citations are mentioned in the section on Safety RLHF?"
)
citation_query_str = (
    "Which citations are mentioned in the section on Safety RLHF?"
)

让我们看看带有few-shot示例函数的格式化提示是什么样子。（为简洁起见，我们填写了测试上下文）

In [ ]:

Copied!





print(
    qa_prompt_tmpl.format(
        query_str=citation_query_str, context_str="test_context"
    )
)
print(
    qa_prompt_tmpl.format(
        query_str=citation_query_str, context_str="test_context"
    )
)

Context information is below.
---------------------
test_context
---------------------
Given the context information and not prior knowledge, answer the query asking about citations over different topics.
Please provide your answer in the form of a structured JSON format containing a list of authors as the citations. Some examples are given below.

Query: Which citation discusses the impact of safety RLHF measured by reward model score distributions?
Response: {'citations': [{'author': 'Llama 2: Open Foundation and Fine-Tuned Chat Models', 'year': 24, 'desc': 'Impact of safety RLHF measured by reward model score distributions. Left: safety reward model scores of generations on the Meta Safety test set. The clustering of samples in the top left corner suggests the improvements of model safety. Right: helpfulness reward model scores of generations on the Meta Helpfulness test set.'}]}

Query: Which citations are mentioned in the section on RLHF Results?
Response: {'citations': [{'author': 'Gilardi et al.', 'year': 2023, 'desc': ''}, {'author': 'Huang et al.', 'year': 2023, 'desc': ''}]}

Query: Which citations are mentioned in the section on Safety RLHF?
Answer:

In [ ]:

Copied!

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

In [ ]:

Copied!

display_prompt_dict(query_engine.get_prompts())
display_prompt_dict(query_engine.get_prompts())

Prompt Key: response_synthesizer:text_qa_template
Text:

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query asking about citations over different topics.
Please provide your answer in the form of a structured JSON format containing a list of authors as the citations. Some examples are given below.

{few_shot_examples}

Query: {query_str}
Answer:

Prompt Key: response_synthesizer:refine_template
Text:

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer:

In [ ]:

Copied!

response = query_engine.query(citation_query_str)
print(str(response))
response = query_engine.query(citation_query_str)
print(str(response))

{'citations': [{'author': 'Llama 2: Open Foundation and Fine-Tuned Chat Models', 'year': 24, 'desc': 'Safety RLHF'}, {'author': 'Bai et al.', 'year': 2022a, 'desc': 'RLHF stage'}, {'author': 'Bai et al.', 'year': 2022a, 'desc': 'adversarial prompts'}, {'author': 'Bai et al.', 'year': 2022a, 'desc': 'safety reward model'}, {'author': 'Bai et al.', 'year': 2022a, 'desc': 'helpfulness reward model'}, {'author': 'Bai et al.', 'year': 2022a, 'desc': 'safety tuning with RLHF'}]}

In [ ]:

Copied!

print(response.source_nodes[1].get_content())
print(response.source_nodes[1].get_content())

上下文转换 - PII 示例¶

我们还可以将上下文转换动态地添加为提示变量中的函数。在这个示例中，我们展示了如何在将context_str输入到上下文窗口之前处理它 - 具体来说是对PII进行屏蔽（这是缓解数据隐私/安全问题的一步）。

注意：您也可以在将其输入提示之前执行这些步骤，但这样可以灵活地为您定义的任何QA提示动态执行所有这些操作！

In [ ]:

Copied!





from llama_index.core.postprocessor import (
    NERPIINodePostprocessor,
    SentenceEmbeddingOptimizer,
)
from llama_index.core import QueryBundle
from llama_index.core.schema import NodeWithScore, TextNode
from llama_index.core.postprocessor import (
    NERPIINodePostprocessor,
    SentenceEmbeddingOptimizer,
)
from llama_index.core import QueryBundle
from llama_index.core.schema import NodeWithScore, TextNode

In [ ]:

Copied!

pii_processor = NERPIINodePostprocessor(llm=gpt4_llm)
pii_processor = NERPIINodePostprocessor(llm=gpt4_llm)

In [ ]:

Copied!





def filter_pii_fn(**kwargs):
    # 运行优化器
    query_bundle = QueryBundle(query_str=kwargs["query_str"])

    new_nodes = pii_processor.postprocess_nodes(
        [NodeWithScore(node=TextNode(text=kwargs["context_str"]))],
        query_bundle=query_bundle,
    )
    new_node = new_nodes[0]
    return new_node.get_content()
def filter_pii_fn(**kwargs):
    # 运行优化器
    query_bundle = QueryBundle(query_str=kwargs["query_str"])

    new_nodes = pii_processor.postprocess_nodes(
        [NodeWithScore(node=TextNode(text=kwargs["context_str"]))],
        query_bundle=query_bundle,
    )
    new_node = new_nodes[0]
    return new_node.get_content()

In [ ]:

Copied!





qa_prompt_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt_tmpl = PromptTemplate(
    qa_prompt_tmpl_str, function_mappings={"context_str": filter_pii_fn}
)
qa_prompt_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt_tmpl = PromptTemplate(
    qa_prompt_tmpl_str, function_mappings={"context_str": filter_pii_fn}
)

In [ ]:

Copied!

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

In [ ]:

Copied!

# 查看提示
retrieved_nodes = vector_retriever.retrieve(query_str)
context_str = "\n\n".join([n.get_content() for n in retrieved_nodes])
# 查看提示
retrieved_nodes = vector_retriever.retrieve(query_str)
context_str = "\n\n".join([n.get_content() for n in retrieved_nodes])

In [ ]:

Copied!

print(qa_prompt_tmpl.format(query_str=query_str, context_str=context_str))
print(qa_prompt_tmpl.format(query_str=query_str, context_str=context_str))

In [ ]:

Copied!

response = query_engine.query(query_str)
print(str(response))
response = query_engine.query(query_str)
print(str(response))