LlamaIndex查询管道介绍¶

概述¶

LlamaIndex提供了一种声明式查询API，允许您将不同的模块串联在一起，以编排数据的简单到高级工作流程。

这围绕着我们的QueryPipeline抽象展开。加载各种模块（从LLM到提示，再到检索器，再到其他管道），将它们全部连接成一个顺序链或有向无环图(DAG)，并进行端到端的运行。

注意：您可以在没有声明式管道抽象的情况下编排所有这些工作流程（通过使用模块进行命令式操作并编写自己的函数）。那么，QueryPipeline有什么优势呢？

用更少的代码/样板表达常见工作流程
可读性更强
与常见的低代码/无代码解决方案（例如LangFlow）具有更高的平等性/更好的集成点
【未来】声明式接口允许轻松序列化管道组件，提供管道的可移植性/更容易部署到不同的系统。

烹饪书¶

在这本烹饪书中，我们将向您介绍我们的QueryPipeline接口，并展示一些您可以处理的基本工作流程。

将提示和LLM串联在一起
将查询重写（提示+LLM）与检索串联在一起
将完整的RAG查询管道（查询重写、检索、重新排序、响应合成）串联在一起
设置自定义查询组件

设置¶

在这里，我们设置一些数据和索引（来自PG的文章），这些数据和索引将在本手册的其余部分中使用。

In [ ]:

Copied!

%pip install llama-index-embeddings-openai
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai

In [ ]:

Copied!

# 设置 Arize Phoenix 用于日志记录/可观测性import phoenix as pxpx.launch_app()import llama_index.corellama_index.core.set_global_handler("arize_phoenix")
# 设置 Arize Phoenix 用于日志记录/可观测性import phoenix as pxpx.launch_app()import llama_index.corellama_index.core.set_global_handler("arize_phoenix")

🌍 To view the Phoenix app in your browser, visit http://127.0.0.1:6006/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix

In [ ]:

Copied!

import os

os.environ["OPENAI_API_KEY"] = "sk-..."
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [ ]:

Copied!





from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader("../data/paul_graham")
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader("../data/paul_graham")

In [ ]:

Copied!

docs = reader.load_data()
docs = reader.load_data()

In [ ]:

Copied!

import osfrom llama_index.core import (    StorageContext,    VectorStoreIndex,    load_index_from_storage,)if not os.path.exists("storage"):    index = VectorStoreIndex.from_documents(docs)    # 将索引保存到磁盘    index.set_index_id("vector_index")    index.storage_context.persist("./storage")else:    # 重建存储上下文    storage_context = StorageContext.from_defaults(persist_dir="storage")    # 加载索引    index = load_index_from_storage(storage_context, index_id="vector_index")
import osfrom llama_index.core import (    StorageContext,    VectorStoreIndex,    load_index_from_storage,)if not os.path.exists("storage"):    index = VectorStoreIndex.from_documents(docs)    # 将索引保存到磁盘    index.set_index_id("vector_index")    index.storage_context.persist("./storage")else:    # 重建存储上下文    storage_context = StorageContext.from_defaults(persist_dir="storage")    # 加载索引    index = load_index_from_storage(storage_context, index_id="vector_index")

1. 将提示和LLM链接在一起¶

在本节中，我们展示了将提示与LLM链接在一起的超级简单工作流程。

我们在初始化时简单地定义了chain。这是一个查询管道的特殊情况，其中组件完全是顺序的，并且我们会自动将输出转换为下一个输入的正确格式。

In [ ]:

Copied!

from llama_index.core.query_pipeline import QueryPipelinefrom llama_index.core import PromptTemplate# 尝试链接基本提示prompt_str = "请生成与{movie_name}相关的电影"prompt_tmpl = PromptTemplate(prompt_str)llm = OpenAI(model="gpt-3.5-turbo")p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)
from llama_index.core.query_pipeline import QueryPipelinefrom llama_index.core import PromptTemplate# 尝试链接基本提示prompt_str = "请生成与{movie_name}相关的电影"prompt_tmpl = PromptTemplate(prompt_str)llm = OpenAI(model="gpt-3.5-turbo")p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)

In [ ]:

Copied!

output = p.run(movie_name="The Departed")
output = p.run(movie_name="The Departed")

> Running module 8dc57d24-9691-4d8d-87d7-151865a7cd1b with input: 
movie_name: The Departed

> Running module 7ed9e26c-a704-4b0b-9cfd-991266e754c0 with input: 
messages: Please generate related movies to The Departed

In [ ]:

Copied!

print(str(output))
print(str(output))

assistant: 1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed
2. The Town (2010) - A crime thriller directed by and starring Ben Affleck
3. Mystic River (2003) - A crime drama directed by Clint Eastwood
4. Goodfellas (1990) - A classic mobster film directed by Martin Scorsese
5. The Irishman (2019) - Another crime drama directed by Martin Scorsese, starring Robert De Niro and Al Pacino
6. The Departed (2006) - The Departed is a 2006 American crime film directed by Martin Scorsese and written by William Monahan. It is a remake of the 2002 Hong Kong film Infernal Affairs. The film stars Leonardo DiCaprio, Matt Damon, Jack Nicholson, and Mark Wahlberg, with Martin Sheen, Ray Winstone, Vera Farmiga, and Alec Baldwin in supporting roles.

查看中间输入/输出¶

为了调试和其他目的，我们也可以查看每个步骤的输入和输出。

In [ ]:

Copied!

output, intermediates = p.run_with_intermediates(movie_name="The Departed")
output, intermediates = p.run_with_intermediates(movie_name="The Departed")

> Running module 8dc57d24-9691-4d8d-87d7-151865a7cd1b with input: 
movie_name: The Departed

> Running module 7ed9e26c-a704-4b0b-9cfd-991266e754c0 with input: 
messages: Please generate related movies to The Departed

In [ ]:

Copied!

intermediates["8dc57d24-9691-4d8d-87d7-151865a7cd1b"]
intermediates["8dc57d24-9691-4d8d-87d7-151865a7cd1b"]

Out[ ]:

ComponentIntermediates(inputs={'movie_name': 'The Departed'}, outputs={'prompt': 'Please generate related movies to The Departed'})

In [ ]:

Copied!

intermediates["7ed9e26c-a704-4b0b-9cfd-991266e754c0"]
intermediates["7ed9e26c-a704-4b0b-9cfd-991266e754c0"]

Out[ ]:

ComponentIntermediates(inputs={'messages': 'Please generate related movies to The Departed'}, outputs={'output': ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed\n2. The Town (2010) - A crime thriller directed by Ben Affleck\n3. Mystic River (2003) - A crime drama directed by Clint Eastwood\n4. Goodfellas (1990) - A classic crime film directed by Martin Scorsese\n5. The Irishman (2019) - Another crime film directed by Martin Scorsese, starring Robert De Niro and Al Pacino\n6. The Godfather (1972) - A classic crime film directed by Francis Ford Coppola\n7. Heat (1995) - A crime thriller directed by Michael Mann, starring Al Pacino and Robert De Niro\n8. The Departed (2006) - A crime thriller directed by Martin Scorsese, starring Leonardo DiCaprio and Matt Damon.', additional_kwargs={}), raw={'id': 'chatcmpl-9EKf2nZ4latFJvHy0gzOUZbaB8xwY', 'choices': [Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed\n2. The Town (2010) - A crime thriller directed by Ben Affleck\n3. Mystic River (2003) - A crime drama directed by Clint Eastwood\n4. Goodfellas (1990) - A classic crime film directed by Martin Scorsese\n5. The Irishman (2019) - Another crime film directed by Martin Scorsese, starring Robert De Niro and Al Pacino\n6. The Godfather (1972) - A classic crime film directed by Francis Ford Coppola\n7. Heat (1995) - A crime thriller directed by Michael Mann, starring Al Pacino and Robert De Niro\n8. The Departed (2006) - A crime thriller directed by Martin Scorsese, starring Leonardo DiCaprio and Matt Damon.', role='assistant', function_call=None, tool_calls=None))], 'created': 1713203040, 'model': 'gpt-3.5-turbo-0125', 'object': 'chat.completion', 'system_fingerprint': 'fp_c2295e73ad', 'usage': CompletionUsage(completion_tokens=184, prompt_tokens=15, total_tokens=199)}, delta=None, logprobs=None, additional_kwargs={})})

尝试输出解析¶

让我们将输出解析成一个结构化的Pydantic对象。

In [ ]:

Copied!

from typing import Listfrom pydantic import BaseModel, Fieldfrom llama_index.core.output_parsers import PydanticOutputParserclass Movie(BaseModel):    """表示单部电影的对象。"""    name: str = Field(..., description="电影的名称。")    year: int = Field(..., description="电影的年份。")class Movies(BaseModel):    """表示电影列表的对象。"""    movies: List[Movie] = Field(..., description="电影列表。")llm = OpenAI(model="gpt-3.5-turbo")output_parser = PydanticOutputParser(Movies)json_prompt_str = """\请生成与{movie_name}相关的电影。输出格式如下所示的JSON格式： """json_prompt_str = output_parser.format(json_prompt_str)
from typing import Listfrom pydantic import BaseModel, Fieldfrom llama_index.core.output_parsers import PydanticOutputParserclass Movie(BaseModel):    """表示单部电影的对象。"""    name: str = Field(..., description="电影的名称。")    year: int = Field(..., description="电影的年份。")class Movies(BaseModel):    """表示电影列表的对象。"""    movies: List[Movie] = Field(..., description="电影列表。")llm = OpenAI(model="gpt-3.5-turbo")output_parser = PydanticOutputParser(Movies)json_prompt_str = """\请生成与{movie_name}相关的电影。输出格式如下所示的JSON格式： """json_prompt_str = output_parser.format(json_prompt_str)

In [ ]:

Copied!

# 将JSON规范添加到提示模板json_prompt_tmpl = PromptTemplate(json_prompt_str)p = QueryPipeline(chain=[json_prompt_tmpl, llm, output_parser], verbose=True)output = p.run(movie_name="Toy Story")
# 将JSON规范添加到提示模板json_prompt_tmpl = PromptTemplate(json_prompt_str)p = QueryPipeline(chain=[json_prompt_tmpl, llm, output_parser], verbose=True)output = p.run(movie_name="Toy Story")

> Running module 2e4093c5-ae62-420a-be91-9c28c057bada with input: 
movie_name: Toy Story

> Running module 3b41f95c-f54b-41d7-8ef0-8e45b5d7eeb0 with input: 
messages: Please generate related movies to Toy Story. Output with the following JSON format: 



Here's a JSON schema to follow:
{"title": "Movies", "description": "Object representing a list of movies.", "typ...

> Running module 27e79a16-72de-4ce2-8b2e-94932c4069c3 with input: 
input: assistant: {
  "movies": [
    {
      "name": "Finding Nemo",
      "year": 2003
    },
    {
      "name": "Monsters, Inc.",
      "year": 2001
    },
    {
      "name": "Cars",
      "year": 2006
...

In [ ]:

Copied!

output
output

Out[ ]:

Movies(movies=[Movie(name='Finding Nemo', year=2003), Movie(name='Monsters, Inc.', year=2001), Movie(name='Cars', year=2006), Movie(name='The Incredibles', year=2004), Movie(name='Ratatouille', year=2007)])

流式支持¶

查询管道具有LLM流式支持（只需使用 as_query_component(streaming=True)）。中间输出将自动转换，最终输出可以是流式输出。以下是一些示例。

1. 使用流式处理连接多个提示

In [ ]:

Copied!

prompt_str = "请生成与{movie_name}相关的电影"prompt_tmpl = PromptTemplate(prompt_str)# 让我们添加一些有趣的后续提示prompt_str2 = """\这里有一些文本：{text}你能用每部电影的摘要重写这个吗？"""prompt_tmpl2 = PromptTemplate(prompt_str2)llm = OpenAI(model="gpt-3.5-turbo")llm_c = llm.as_query_component(streaming=True)p = QueryPipeline(    chain=[prompt_tmpl, llm_c, prompt_tmpl2, llm_c], verbose=True)# p = QueryPipeline(chain=[prompt_tmpl, llm_c], verbose=True)
prompt_str = "请生成与{movie_name}相关的电影"prompt_tmpl = PromptTemplate(prompt_str)# 让我们添加一些有趣的后续提示prompt_str2 = """\这里有一些文本：{text}你能用每部电影的摘要重写这个吗？"""prompt_tmpl2 = PromptTemplate(prompt_str2)llm = OpenAI(model="gpt-3.5-turbo")llm_c = llm.as_query_component(streaming=True)p = QueryPipeline(    chain=[prompt_tmpl, llm_c, prompt_tmpl2, llm_c], verbose=True)# p = QueryPipeline(chain=[prompt_tmpl, llm_c], verbose=True)

In [ ]:

Copied!

output = p.run(movie_name="The Dark Knight")
for o in output:
    print(o.delta, end="")
output = p.run(movie_name="The Dark Knight")
for o in output:
    print(o.delta, end="")

> Running module 213af6d4-3450-46af-9087-b80656ae6951 with input: 
movie_name: The Dark Knight

> Running module 3ff7e987-f5f3-4b36-a3e1-be5a4821d9d9 with input: 
messages: Please generate related movies to The Dark Knight

> Running module a2841bd3-c833-4427-9a7e-83b19872b064 with input: 
text: <generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x298d338b0>

> Running module c7e0a454-213a-460e-b029-f2d42fd7d938 with input: 
messages: Here's some text:

1. Batman Begins (2005)
2. The Dark Knight Rises (2012)
3. Batman v Superman: Dawn of Justice (2016)
4. Man of Steel (2013)
5. The Avengers (2012)
6. Iron Man (2008)
7. Captain Amer...

1. Batman Begins (2005): A young Bruce Wayne becomes Batman to fight crime in Gotham City, facing his fears and training under the guidance of Ra's al Ghul.
2. The Dark Knight Rises (2012): Batman returns to protect Gotham City from the ruthless terrorist Bane, who plans to destroy the city and its symbol of hope.
3. Batman v Superman: Dawn of Justice (2016): Batman and Superman clash as their ideologies collide, leading to an epic battle while a new threat emerges that threatens humanity.
4. Man of Steel (2013): The origin story of Superman, as he embraces his powers and faces General Zod, a fellow Kryptonian seeking to destroy Earth.
5. The Avengers (2012): Earth's mightiest heroes, including Iron Man, Captain America, Thor, and Hulk, join forces to stop Loki and his alien army from conquering the world.
6. Iron Man (2008): Billionaire Tony Stark builds a high-tech suit to escape captivity and becomes the superhero Iron Man, using his technology to fight against evil.
7. Captain America: The Winter Soldier (2014): Captain America teams up with Black Widow and Falcon to uncover a conspiracy within S.H.I.E.L.D. while facing a deadly assassin known as the Winter Soldier.
8. The Amazing Spider-Man (2012): Peter Parker, a high school student bitten by a radioactive spider, becomes Spider-Man and battles the Lizard, a monstrous villain threatening New York City.
9. Watchmen (2009): Set in an alternate reality, a group of retired vigilantes investigates the murder of one of their own, uncovering a conspiracy that could have catastrophic consequences.
10. Sin City (2005): A neo-noir anthology film set in the crime-ridden city of Basin City, following various characters as they navigate through corruption, violence, and redemption.
11. V for Vendetta (2005): In a dystopian future, a masked vigilante known as V fights against a totalitarian government, inspiring the people to rise up and reclaim their freedom.
12. Blade Runner 2049 (2017): A young blade runner uncovers a long-buried secret that leads him to seek out former blade runner Rick Deckard, while unraveling the mysteries of a future society.
13. Inception (2010): A skilled thief enters people's dreams to steal information, but is tasked with planting an idea instead, leading to a mind-bending journey through multiple layers of reality.
14. The Matrix (1999): A computer hacker discovers the truth about reality, joining a group of rebels fighting against sentient machines that have enslaved humanity in a simulated world.
15. The Crow (1994): A musician, resurrected by a supernatural crow, seeks vengeance against the gang that murdered him and his fiancée, unleashing a dark and atmospheric tale of revenge.

2. 将流式输出传送到输出解析器

In [ ]:

Copied!





p = QueryPipeline(
    chain=[
        json_prompt_tmpl,
        llm.as_query_component(streaming=True),
        output_parser,
    ],
    verbose=True,
)
output = p.run(movie_name="Toy Story")
print(output)
p = QueryPipeline(
    chain=[
        json_prompt_tmpl,
        llm.as_query_component(streaming=True),
        output_parser,
    ],
    verbose=True,
)
output = p.run(movie_name="Toy Story")
print(output)

> Running module fe1dbf6a-56e0-44bf-97d7-a2a1fe9d9b8c with input: 
movie_name: Toy Story

> Running module a8eaaf91-df9d-46c4-bbae-06c15cd15123 with input: 
messages: Please generate related movies to Toy Story. Output with the following JSON format: 



Here's a JSON schema to follow:
{"title": "Movies", "description": "Object representing a list of movies.", "typ...

> Running module fcbc0b09-0ef5-43e0-b007-c4508fd6742f with input: 
input: <generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x298d32dc0>

movies=[Movie(name='Finding Nemo', year=2003), Movie(name='Monsters, Inc.', year=2001), Movie(name='The Incredibles', year=2004), Movie(name='Cars', year=2006), Movie(name='Ratatouille', year=2007)]

使用检索链式重写工作流程（提示 + LLM）进行链接¶

在这里，我们尝试一个稍微复杂一点的工作流程，我们在进行检索之前将输入通过两个提示发送。

生成关于给定主题的问题。
根据问题虚构答案，以便更好地进行检索。

由于每个提示只接受一个输入，请注意QueryPipeline将自动将LLM输出链接到提示，然后再链接到LLM中。

您将在下一节中看到如何更明确地定义链接。

In [ ]:

Copied!

# !pip install llama-index-postprocessor-cohere-rerank
# !pip install llama-index-postprocessor-cohere-rerank

In [ ]:

Copied!

from llama_index.postprocessor.cohere_rerank import CohereRerank# 生成关于主题的问题prompt_str1 = "请针对以下主题生成一个简洁的关于Paul Graham生活的问题 {topic}"prompt_tmpl1 = PromptTemplate(prompt_str1)# 使用HyDE来虚构答案。prompt_str2 = (    "请写一段回答这个问题的文章\n"    "尽量包含尽可能多的关键细节。\n"    "\n"    "\n"    "{query_str}\n"    "\n"    "\n"    '文章:"""\n')prompt_tmpl2 = PromptTemplate(prompt_str2)llm = OpenAI(model="gpt-3.5-turbo")retriever = index.as_retriever(similarity_top_k=5)p = QueryPipeline(    chain=[prompt_tmpl1, llm, prompt_tmpl2, llm, retriever], verbose=True)
from llama_index.postprocessor.cohere_rerank import CohereRerank# 生成关于主题的问题prompt_str1 = "请针对以下主题生成一个简洁的关于Paul Graham生活的问题 {topic}"prompt_tmpl1 = PromptTemplate(prompt_str1)# 使用HyDE来虚构答案。prompt_str2 = (    "请写一段回答这个问题的文章\n"    "尽量包含尽可能多的关键细节。\n"    "\n"    "\n"    "{query_str}\n"    "\n"    "\n"    '文章:"""\n')prompt_tmpl2 = PromptTemplate(prompt_str2)llm = OpenAI(model="gpt-3.5-turbo")retriever = index.as_retriever(similarity_top_k=5)p = QueryPipeline(    chain=[prompt_tmpl1, llm, prompt_tmpl2, llm, retriever], verbose=True)

In [ ]:

Copied!

nodes = p.run(topic="college")
len(nodes)
nodes = p.run(topic="college")
len(nodes)

> Running module f5435516-61b6-49e9-9926-220cfb6443bd with input: 
topic: college

> Running module 1dcaa097-cedc-4466-81bb-f8fd8768762b with input: 
messages: Please generate a concise question about Paul Graham's life regarding the following topic college

> Running module 891afa10-5fe0-47ed-bdee-42a59d0e916d with input: 
query_str: assistant: How did Paul Graham's college experience shape his career and entrepreneurial mindset?

> Running module 5bcd9964-b972-41a9-960d-96894c57a372 with input: 
messages: Please write a passage to answer the question
Try to include as many key details as possible.


How did Paul Graham's college experience shape his career and entrepreneurial mindset?


Passage:"""


> Running module 0b81a91a-2c90-4700-8ba1-25ffad5311fd with input: 
input: assistant: Paul Graham's college experience played a pivotal role in shaping his career and entrepreneurial mindset. As a student at Cornell University, Graham immersed himself in the world of compute...

Out[ ]:

创建一个完整的RAG管道作为DAG¶

在这里，我们将一个完整的RAG管道链接在一起，包括查询重写、检索、重新排名和响应合成。

在这里，我们不能使用chain语法，因为某些模块依赖于多个输入（例如，响应合成期望检索到的节点和原始问题）。相反，我们将通过add_modules和add_link来显式地构建一个DAG。

1. 使用查询重写的RAG Pipeline¶

在将查询传递给我们的下游模块 - 检索/重新排序/合成之前，我们使用LLM先重写查询。

In [ ]:

Copied!

from llama_index.postprocessor.cohere_rerank import CohereRerankfrom llama_index.core.response_synthesizers import TreeSummarize# 定义模块prompt_str = "请针对以下主题生成有关Paul Graham生活的问题 {topic}"prompt_tmpl = PromptTemplate(prompt_str)llm = OpenAI(model="gpt-3.5-turbo")retriever = index.as_retriever(similarity_top_k=3)reranker = CohereRerank()summarizer = TreeSummarize(llm=llm)
from llama_index.postprocessor.cohere_rerank import CohereRerankfrom llama_index.core.response_synthesizers import TreeSummarize# 定义模块prompt_str = "请针对以下主题生成有关Paul Graham生活的问题 {topic}"prompt_tmpl = PromptTemplate(prompt_str)llm = OpenAI(model="gpt-3.5-turbo")retriever = index.as_retriever(similarity_top_k=3)reranker = CohereRerank()summarizer = TreeSummarize(llm=llm)

In [ ]:

Copied!

# 定义查询管道p = QueryPipeline(verbose=True)p.add_modules(    {        "llm": llm,        "prompt_tmpl": prompt_tmpl,        "retriever": retriever,        "summarizer": summarizer,        "reranker": reranker,    })
# 定义查询管道p = QueryPipeline(verbose=True)p.add_modules(    {        "llm": llm,        "prompt_tmpl": prompt_tmpl,        "retriever": retriever,        "summarizer": summarizer,        "reranker": reranker,    })

接下来，我们使用add_link在模块之间绘制链接。add_link接受源模块和目标模块的id，以及可选的source_key和dest_key。如果有多个输出或输入，可以指定source_key或dest_key。

您可以通过module.as_query_component().input_keys和module.as_query_component().output_keys查看每个模块的输入/输出键集合。

在这里，我们明确为reranker和summarizer模块指定了dest_key，因为它们分别接受两个输入（query_str和nodes）。

In [ ]:

Copied!

p.add_link("prompt_tmpl", "llm")p.add_link("llm", "retriever")p.add_link("retriever", "reranker", dest_key="nodes")p.add_link("llm", "reranker", dest_key="query_str")p.add_link("reranker", "summarizer", dest_key="nodes")p.add_link("llm", "summarizer", dest_key="query_str")# 查看summarizer的输入键print(summarizer.as_query_component().input_keys)
p.add_link("prompt_tmpl", "llm")p.add_link("llm", "retriever")p.add_link("retriever", "reranker", dest_key="nodes")p.add_link("llm", "reranker", dest_key="query_str")p.add_link("reranker", "summarizer", dest_key="nodes")p.add_link("llm", "summarizer", dest_key="query_str")# 查看summarizer的输入键print(summarizer.as_query_component().input_keys)

required_keys={'query_str', 'nodes'} optional_keys=set()

我们使用 networkx 来存储图的表示。这为我们提供了一种简单的方式来查看有向无环图（DAG）！

In [ ]:

Copied!

## 创建图from pyvis.network import Networknet = Network(notebook=True, cdn_resources="in_line", directed=True)net.from_nx(p.dag)net.show("rag_dag.html")## 另一种选项使用 `pygraphviz`# from networkx.drawing.nx_agraph import to_agraph# from IPython.display import Image# agraph = to_agraph(p.dag)# agraph.layout(prog="dot")# agraph.draw('rag_dag.png')# display(Image('rag_dag.png'))
## 创建图from pyvis.network import Networknet = Network(notebook=True, cdn_resources="in_line", directed=True)net.from_nx(p.dag)net.show("rag_dag.html")## 另一种选项使用 `pygraphviz`# from networkx.drawing.nx_agraph import to_agraph# from IPython.display import Image# agraph = to_agraph(p.dag)# agraph.layout(prog="dot")# agraph.draw('rag_dag.png')# display(Image('rag_dag.png'))

rag_dag.html

Out[ ]:

In [ ]:

Copied!

response = p.run(topic="YC")
response = p.run(topic="YC")

> Running module prompt_tmpl with input: 
topic: YC

> Running module llm with input: 
messages: Please generate a question about Paul Graham's life regarding the following topic YC

> Running module retriever with input: 
input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?

> Running module reranker with input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='ccd39041-5a64-4bd3-aca7-48f804b5a23f', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...

> Running module summarizer with input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='120574dd-a5c9-4985-ab3e-37b1070b500a', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...

In [ ]:

Copied!

print(str(response))
print(str(response))

Paul Graham played a significant role in the founding and development of Y Combinator (YC). He was one of the co-founders of YC and provided the initial funding for the investment firm. Along with his partners, he implemented the ideas they had been discussing and started their own investment firm. Paul Graham also played a key role in shaping the unique batch model of YC, where a group of startups is funded and provided intensive support for a period of three months. He was actively involved in selecting and helping the founders, and he also wrote essays and worked on YC's internal software.

In [ ]:

Copied!

# 你也可以进行异步操作response = await p.arun(topic="YC")print(str(response))
# 你也可以进行异步操作response = await p.arun(topic="YC")print(str(response))

> Running modules and inputs in parallel: 
Module key: prompt_tmpl. Input: 
topic: YC


> Running modules and inputs in parallel: 
Module key: llm. Input: 
messages: Please generate a question about Paul Graham's life regarding the following topic YC


> Running modules and inputs in parallel: 
Module key: retriever. Input: 
input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?


> Running modules and inputs in parallel: 
Module key: reranker. Input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='ccd39041-5a64-4bd3-aca7-48f804b5a23f', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...


> Running modules and inputs in parallel: 
Module key: summarizer. Input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='120574dd-a5c9-4985-ab3e-37b1070b500a', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...


Paul Graham played a significant role in the founding and development of Y Combinator (YC). He was one of the co-founders of YC and provided the initial funding for the investment firm. Along with his partners, he implemented the ideas they had been discussing and decided to start their own investment firm. Paul Graham also played a key role in shaping the unique batch model of YC, where a group of startups is funded and provided intensive support for a period of three months. He was actively involved in selecting and helping the founders and worked on various projects related to YC, including writing essays and developing internal software.

2. 不带查询重写的RAG Pipeline¶

在这里，我们设置了一个不带查询重写步骤的RAG管道。

在这里，我们需要一种方法来将输入查询链接到检索器、重新排名器和摘要生成器。我们可以通过定义一个特殊的 InputComponent 来实现这一点，从而将输入链接到多个下游模块。

In [ ]:

Copied!





from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.query_pipeline import InputComponent

retriever = index.as_retriever(similarity_top_k=5)
summarizer = TreeSummarize(llm=OpenAI(model="gpt-3.5-turbo"))
reranker = CohereRerank()
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.query_pipeline import InputComponent

retriever = index.as_retriever(similarity_top_k=5)
summarizer = TreeSummarize(llm=OpenAI(model="gpt-3.5-turbo"))
reranker = CohereRerank()

In [ ]:

Copied!





p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")
p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

In [ ]:

Copied!

output = p.run(input="what did the author do in YC")
output = p.run(input="what did the author do in YC")

> Running module input with input: 
input: what did the author do in YC

> Running module retriever with input: 
input: what did the author do in YC

> Running module summarizer with input: 
query_str: what did the author do in YC
nodes: [NodeWithScore(node=TextNode(id_='86dea730-ca35-4bcb-9f9b-4c99e8eadd08', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...

In [ ]:

Copied!

print(str(output))
print(str(output))

The author worked on various projects at YC, including writing essays and working on YC's internal software. They also played a key role in the creation and operation of YC by funding the program with their own money and organizing a batch model where they would fund a group of startups twice a year. They provided support and guidance to the startups during a three-month intensive program and used their building in Cambridge as the headquarters for YC. Additionally, they hosted weekly dinners where experts on startups would give talks.

在查询管道中定义自定义组件¶

您可以轻松地定义一个自定义组件。只需对QueryComponent进行子类化，实现验证/运行函数和一些辅助函数，并将其插入其中。

让我们将第一个示例中的相关电影生成提示+LLM链包装成一个自定义组件。

In [ ]:

Copied!

from llama_index.core.query_pipeline import (    CustomQueryComponent,    InputKeys,    OutputKeys,)from typing import Dict, Anyfrom llama_index.core.llms.llm import LLMfrom pydantic import Fieldclass RelatedMovieComponent(CustomQueryComponent):    """相关电影组件。"""    llm: LLM = Field(..., description="OpenAI LLM")    def _validate_component_inputs(        self, input: Dict[str, Any]    ) -> Dict[str, Any]:        """在运行组件期间验证组件输入。"""        # 注意：这是可选的，但我们在这里展示了如何进行验证作为示例。        return input    @property    def _input_keys(self) -> set:        """输入键字典。"""        # 注意：这些是必需的输入。如果有可选输入，请覆盖`optional_input_keys_dict`。        return {"movie"}    @property    def _output_keys(self) -> set:        return {"output"}    def _run_component(self, **kwargs) -> Dict[str, Any]:        """运行组件。"""        # 在这里使用QueryPipeline本身是为了方便        prompt_str = "请生成与{movie_name}相关的电影"        prompt_tmpl = PromptTemplate(prompt_str)        p = QueryPipeline(chain=[prompt_tmpl, llm])        return {"output": p.run(movie_name=kwargs["movie"])}
from llama_index.core.query_pipeline import (    CustomQueryComponent,    InputKeys,    OutputKeys,)from typing import Dict, Anyfrom llama_index.core.llms.llm import LLMfrom pydantic import Fieldclass RelatedMovieComponent(CustomQueryComponent):    """相关电影组件。"""    llm: LLM = Field(..., description="OpenAI LLM")    def _validate_component_inputs(        self, input: Dict[str, Any]    ) -> Dict[str, Any]:        """在运行组件期间验证组件输入。"""        # 注意：这是可选的，但我们在这里展示了如何进行验证作为示例。        return input    @property    def _input_keys(self) -> set:        """输入键字典。"""        # 注意：这些是必需的输入。如果有可选输入，请覆盖`optional_input_keys_dict`。        return {"movie"}    @property    def _output_keys(self) -> set:        return {"output"}    def _run_component(self, **kwargs) -> Dict[str, Any]:        """运行组件。"""        # 在这里使用QueryPipeline本身是为了方便        prompt_str = "请生成与{movie_name}相关的电影"        prompt_tmpl = PromptTemplate(prompt_str)        p = QueryPipeline(chain=[prompt_tmpl, llm])        return {"output": p.run(movie_name=kwargs["movie"])}

让我们尝试一下自定义组件！我们还将添加一个步骤，将输出转换为莎士比亚风格。

In [ ]:

Copied!

llm = OpenAI(model="gpt-3.5-turbo")component = RelatedMovieComponent(llm=llm)# 让我们添加一些后续提示来增加乐趣prompt_str = """\这里有一些文本：{text}你能用莎士比亚的声音重写这段吗？"""prompt_tmpl = PromptTemplate(prompt_str)p = QueryPipeline(chain=[component, prompt_tmpl, llm], verbose=True)
llm = OpenAI(model="gpt-3.5-turbo")component = RelatedMovieComponent(llm=llm)# 让我们添加一些后续提示来增加乐趣prompt_str = """\这里有一些文本：{text}你能用莎士比亚的声音重写这段吗？"""prompt_tmpl = PromptTemplate(prompt_str)p = QueryPipeline(chain=[component, prompt_tmpl, llm], verbose=True)

In [ ]:

Copied!

output = p.run(movie="Love Actually")
output = p.run(movie="Love Actually")

> Running module 31ca224a-f226-4956-882b-73878843d869 with input: 
movie: Love Actually

> Running module febb41b5-2528-416a-bde7-6accdb0f9c51 with input: 
text: assistant: 1. "Valentine's Day" (2010)
2. "New Year's Eve" (2011)
3. "The Holiday" (2006)
4. "Crazy, Stupid, Love" (2011)
5. "Notting Hill" (1999)
6. "Four Weddings and a Funeral" (1994)
7. "Bridget J...

> Running module e834ffbe-e97c-4ab0-9726-24f1534745b2 with input: 
messages: Here's some text:

1. "Valentine's Day" (2010)
2. "New Year's Eve" (2011)
3. "The Holiday" (2006)
4. "Crazy, Stupid, Love" (2011)
5. "Notting Hill" (1999)
6. "Four Weddings and a Funeral" (1994)
7. "B...

In [ ]:

Copied!

print(str(output))
print(str(output))

assistant: 1. "Valentine's Day" (2010) - "A day of love, where hearts entwine, 
   And Cupid's arrow finds its mark divine."

2. "New Year's Eve" (2011) - "When old year fades, and new year dawns,
   We gather 'round, to celebrate the morns."

3. "The Holiday" (2006) - "Two souls, adrift in search of cheer,
   Find solace in a holiday so dear."

4. "Crazy, Stupid, Love" (2011) - "A tale of love, both wild and mad,
   Where hearts are lost, then found, and glad."

5. "Notting Hill" (1999) - "In London town, where love may bloom,
   A humble man finds love, and breaks the gloom."

6. "Four Weddings and a Funeral" (1994) - "Four times the vows, and one time mourn,
   Love's journey, with laughter and tears adorned."

7. "Bridget Jones's Diary" (2001) - "A maiden fair, with wit and charm,
   Records her life, and love's alarm."

8. "About Time" (2013) - "A tale of time, where love transcends,
   And moments cherished, never truly ends."

9. "The Best Exotic Marigold Hotel" (2011) - "In India's land, where dreams unfold,
   A hotel blooms, where hearts find gold."

10. "The Notebook" (2004) - "A love that spans both time and space,
    Where words and memories find their place."

11. "Serendipity" (2001) - "By chance or fate, two souls collide,
    In search of love, they cannot hide."

12. "P.S. I Love You" (2007) - "In letters penned, from love's embrace,
    A departed soul, still finds its trace."

13. "500 Days of Summer" (2009) - "A tale of love, both sweet and sour,
    Where seasons change, and hearts devour."

14. "The Fault in Our Stars" (2014) - "Two hearts, aflame, in starlit skies,
    Love's tragedy, where hope never dies."

15. "La La Land" (2016) - "In dreams and songs, two hearts entwine,
    A city's magic, where love's stars align."