Vectara托管索引¶

在这个笔记本中，我们将展示如何将 Vectara 与 LlamaIndex 结合使用。

Vectara提供了用于检索增强生成（RAG）的端到端托管服务，其中包括：

从文档文件中提取文本并将其分块成句子的方法。
最先进的 Boomerang 嵌入模型。每个文本块都使用Boomerang编码为向量嵌入，并存储在Vectara内部向量存储中。因此，当将Vectara与LlamaIndex一起使用时，您无需调用单独的嵌入模型 - 这在Vectara后端内部自动完成。
一个查询服务，自动将查询编码为嵌入，并检索最相关的文本段落（包括对混合搜索和 MMR 的支持）
一个选项，可以基于检索到的文档创建生成摘要，包括引用。

请参阅 Vectara API文档以获取有关如何使用API的更多信息。

开始¶

如果你在colab上打开这个笔记本，你可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index lama-index-indices-managed-vectara
!pip install llama-index lama-index-indices-managed-vectara

要开始使用Vectara，请注册（如果尚未注册），并按照我们的快速入门指南创建语料库和API密钥。

一旦您拥有这些内容，您可以将它们作为环境变量提供，稍后LlamaIndex代码将使用它们。

import os
os.environ['VECTARA_API_KEY'] = "<YOUR_VECTARA_API_KEY>"
os.environ['VECTARA_CORPUS_ID'] = "<YOUR_VECTARA_CORPUS_ID>"
os.environ['VECTARA_CUSTOMER_ID'] = "<YOUR_VECTARA_CUSTOMER_ID>"

使用LlamaIndex和Vectara进行RAG¶

有几种方法可以将数据索引到Vectara中，包括：

使用VectaraIndex的from_documents()或insert_file()方法
直接在Vectara控制台中上传文件
使用Vectara的FILE_UPLOAD或标准索引API
使用vectara-ingest，这是一个开源的爬虫/索引器项目
使用我们的数据摄取集成合作伙伴，如Airbyte、Unstructured或DataVolo。

为此，我们将使用一组简单的小型文档，因此直接使用VectaraIndex进行摄取就足够了。

让我们将“AI权利法案”文档摄取到我们的新语料库中。

In [ ]:

Copied!





from llama_index.indices.managed.vectara import VectaraIndex
import requests

url = "https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf"
response = requests.get(url)
local_path = "ai-bill-of-rights.pdf"
with open(local_path, "wb") as file:
    file.write(response.content)

index = VectaraIndex()
index.insert_file(
    local_path, metadata={"name": "AI bill of rights", "year": 2022}
)
from llama_index.indices.managed.vectara import VectaraIndex
import requests

url = "https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf"
response = requests.get(url)
local_path = "ai-bill-of-rights.pdf"
with open(local_path, "wb") as file:
    file.write(response.content)

index = VectaraIndex()
index.insert_file(
    local_path, metadata={"name": "AI bill of rights", "year": 2022}
)

使用Vectara查询引擎运行单个查询¶

现在我们已经上传了文档（或者之前已经上传了文档），我们可以直接在LlamaIndex中提出问题。这将激活Vectara的RAG管道。

要使用Vectara内部的LLM进行摘要生成，请确保在生成查询引擎时指定summary_enabled=True。以下是一个示例：

In [ ]:

Copied!





questions = [
    "What are the risks of AI?",
    "What should we do to prevent bad actors from using AI?",
    "What are the benefits?",
]
questions = [
    "What are the risks of AI?",
    "What should we do to prevent bad actors from using AI?",
    "What are the benefits?",
]

In [ ]:

Copied!

qe = index.as_query_engine(summary_enabled=True)
qe.query(questions[0]).response
qe = index.as_query_engine(summary_enabled=True)
qe.query(questions[0]).response

Out[ ]:

"The risks associated with AI include potential biases leading to discriminatory outcomes, lack of transparency in decision-making processes, and challenges in establishing public trust and understanding of algorithmic systems [1]. Safety and efficacy concerns arise in the context of complex technologies like AI, necessitating strong regulations and proactive risk mitigation strategies [2]. The process of identifying and addressing risks before and during the deployment of automated systems is crucial to prevent harm to individuals' rights, opportunities, and access [5]. Furthermore, the impact of AI risks can be most visible at the community level, emphasizing the importance of considering and mitigating harms to various communities [6]. Efforts are being made to translate principles into practice through laws, policies, and technical approaches to ensure AI systems are lawful, respectful, accurate, safe, understandable, responsible, and accountable [7]."

如果希望以流模式返回响应，只需设置streaming=True

In [ ]:

Copied!

qe = index.as_query_engine(summary_enabled=True, streaming=True)
response = qe.query(questions[0])

for chunk in response.response_gen:
    print(chunk.delta or "", end="", flush=True)
qe = index.as_query_engine(summary_enabled=True, streaming=True)
response = qe.query(questions[0])

for chunk in response.response_gen:
    print(chunk.delta or "", end="", flush=True)

The risks of AI include biased data leading to discriminatory outcomes, opaque decision-making processes, and lack of public trust and understanding in algorithmic systems [1]. Organizations are implementing innovative solutions like risk assessments, auditing mechanisms, and ongoing monitoring to mitigate safety and efficacy risks of AI systems [2]. Stakeholder engagement and a risk management framework by institutions like NIST aim to address risks to individuals, organizations, and society posed by AI technology [3]. Risk identification, mitigation, and focusing on safety and effectiveness of AI systems are crucial before and during deployment to protect people’s rights, opportunities, and access [5]. The concept of communities is integral in understanding the impact of AI and automated systems, as the potential harm may be most visible at the community level [6]. Practical implementation of principles such as lawful, purposeful, accurate, safe, and accountable AI is essential to address risks, with federal agencies adhering to guidelines promoting trustworthy AI [7].

使用Vectara聊天¶

Vectara还支持简单的聊天模式。在这种模式下，聊天历史由Vectara维护，因此您不必担心它。要使用它，只需调用as_chat_engine。

（聊天模式始终使用Vectara的摘要功能，因此您无需像以前那样显式地指定summary_enabled=True）

In [ ]:

Copied!

ce = index.as_chat_engine()
ce = index.as_chat_engine()

In [ ]:

Copied!





for q in questions:
    print(f"Question: {q}\n")
    response = ce.chat(q).response
    print(f"Response: {response}\n")
for q in questions:
    print(f"Question: {q}\n")
    response = ce.chat(q).response
    print(f"Response: {response}\n")

Question: What are the risks of AI?

Response: The risks of AI involve potential biases, opaque decision-making processes, and lack of public trust due to discriminatory outcomes and biased data [1]. To mitigate these risks, industry is implementing innovative solutions like risk assessments and monitoring mechanisms [2]. Stakeholder engagement and the development of a risk management framework by organizations like the National Institute of Standards and Technology aim to manage risks posed by AI to individuals, organizations, and society [3]. Identification and mitigation of potential risks, impact assessments, and balancing high impact risks with appropriate mitigation are crucial before and during the deployment of AI systems [5]. The Blueprint for an AI Bill of Rights emphasizes the protection of individuals from unsafe or ineffective AI systems [7].

Question: What should we do to prevent bad actors from using AI?

Response: To prevent the misuse of AI by malicious entities, several key measures can be implemented. Firstly, it is crucial to ensure that automated systems are designed with safety and effectiveness in mind, following principles such as being lawful, purposeful, accurate, secure, and transparent [2]. Entities should proactively identify and manage risks associated with sensitive data, conducting regular audits and limiting access to prevent misuse [3], [4], [5]. Additionally, ongoing monitoring of automated systems is essential to detect and address algorithmic discrimination and unforeseen interactions that could lead to misuse [6], [7]. By incorporating these practices into the design, development, and deployment of AI technologies, the potential for misuse by malicious entities can be significantly reduced.

Question: What are the benefits?

Response: Artificial Intelligence (AI) offers various advantages, such as promoting the use of trustworthy AI systems with principles focusing on legality, performance, safety, transparency, and accountability [1]. Organizations are incorporating protections and ethical principles in AI development, aligning with global recommendations for responsible AI stewardship [2]. Furthermore, research is ongoing to enhance explainable AI systems for better human understanding and trust in AI outcomes [5]. The U.S. government is establishing councils and frameworks to advance AI technologies, ensuring responsible AI implementation across sectors [4], . AI can streamline processes, improve decision-making, and enhance efficiency, although challenges like bias, flaws, and accessibility issues need to be addressed to maximize its benefits [5].

当然，流式传输在聊天中也适用：

In [ ]:

Copied!

ce = index.as_chat_engine(streaming=True)
ce = index.as_chat_engine(streaming=True)

In [ ]:

Copied!

response = ce.stream_chat("Will robots kill us all?")
for chunk in response.chat_stream:
    print(chunk.delta or "", end="", flush=True)
response = ce.stream_chat("Will robots kill us all?")
for chunk in response.chat_stream:
    print(chunk.delta or "", end="", flush=True)

The search results indicate a focus on the relationship between humans and robots, emphasizing the need for co-intelligence and the best use of automated systems [2]. The discussions revolve around ensuring that automated systems are designed, tested, and protected to prevent potential harmful outcomes [1]. While there are concerns about the use of surveillance technology by companies like Amazon and Walmart, the emphasis is on balancing equities and maintaining oversight in law enforcement activities [5]. The search results do not directly answer whether robots will kill us all, but they highlight the importance of proactive protections, context-specific guidance, and existing policies to govern the use of automated systems in various settings [6].

代理 RAG¶

让我们使用 LlamaIndex 创建一个 ReAct 代理，该代理利用 Vectara 作为其 RAG 工具。为此，您需要使用另一个 LLM 作为代理推理的驱动程序，这里我们以 OpenAI 的 GPT4o 作为示例。（为使此工作，请确保您的环境中定义了 OPENAI_API_KEY）。

In [ ]:

Copied!

from llama_index.core.agent import ReActAgentfrom llama_index.llms.openai import OpenAIfrom llama_index.core.tools import QueryEngineTool, ToolMetadatallm = OpenAI(model="gpt-4o", temperature=0)vectara_tool = QueryEngineTool(    query_engine=index.as_query_engine(        summary_enabled=True,        summary_num_results=5,        summary_response_lang="en",        summary_prompt_name="vectara-summary-ext-24-05-large",        reranker="mmr",        rerank_k=50,        mmr_diversity_bias=0.2,    ),    metadata=ToolMetadata(        name="Vectara",        description="Vectara Query Engine that is able to answer Questions about AI regulation.",    ),)agent = ReActAgent.from_tools(    tools=[vectara_tool],    llm=llm,    context="""        你是一个乐于助人的聊天机器人，可以使用Vectara工具回答关于AI监管的任何用户问题。        你将复杂的问题分解成更简单的问题。        你使用Vectara查询引擎来帮助回答更简单的问题。    """,    verbose=True,)
from llama_index.core.agent import ReActAgentfrom llama_index.llms.openai import OpenAIfrom llama_index.core.tools import QueryEngineTool, ToolMetadatallm = OpenAI(model="gpt-4o", temperature=0)vectara_tool = QueryEngineTool(    query_engine=index.as_query_engine(        summary_enabled=True,        summary_num_results=5,        summary_response_lang="en",        summary_prompt_name="vectara-summary-ext-24-05-large",        reranker="mmr",        rerank_k=50,        mmr_diversity_bias=0.2,    ),    metadata=ToolMetadata(        name="Vectara",        description="Vectara Query Engine that is able to answer Questions about AI regulation.",    ),)agent = ReActAgent.from_tools(    tools=[vectara_tool],    llm=llm,    context="""        你是一个乐于助人的聊天机器人，可以使用Vectara工具回答关于AI监管的任何用户问题。        你将复杂的问题分解成更简单的问题。        你使用Vectara查询引擎来帮助回答更简单的问题。    """,    verbose=True,)

In [ ]:

Copied!

问题 = """    人工智能的风险是什么？有哪些好处？    从专家的观点比较和对比，并提供赞成和反对的论点的摘要。"""print(agent.chat(question).response)
问题 = """    人工智能的风险是什么？有哪些好处？    从专家的观点比较和对比，并提供赞成和反对的论点的摘要。"""print(agent.chat(question).response)

Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: Vectara
Action Input: {'input': 'What are the risks of AI?'}
Observation: The risks of AI include biased data leading to discriminatory outcomes, opaque decision-making processes, and a lack of public trust and understanding in algorithmic systems. Mitigation strategies discussed involve ongoing transparency, participatory design, and engaging with impacted communities to understand potential harms and integrate protections into the design of AI systems [1]. Additionally, there's a focus on identifying and mitigating risks before deployment, particularly those impacting people's rights, opportunities, or safety, with a strong emphasis on avoiding systems that inherently violate safety norms [5].
Thought: I have obtained information about the risks of AI. Now, I need to gather information about the benefits of AI to provide a comprehensive comparison.
Action: Vectara
Action Input: {'input': 'What are the benefits of AI?'}
Observation: The benefits of AI include its transformative potential to improve people's lives by building better and more innovative infrastructure. It also offers the possibility to enhance community health, safety, and welfare by ensuring better representation of all voices, particularly those traditionally marginalized by technological advances [1]. AI can also prevent harms and improve opportunities, rights, and access for Americans, playing a central role in shaping important policies like the Blueprint for an AI Bill of Rights [2].
Thought: I have gathered information about both the risks and benefits of AI. Now, I need to compare and contrast these points and provide a summary with arguments for and against from experts.
Answer: ### Comparison of Risks and Benefits of AI

#### Risks of AI:
1. **Biased Data and Discriminatory Outcomes**: AI systems can perpetuate and even exacerbate biases present in the data they are trained on, leading to unfair and discriminatory outcomes.
2. **Opaque Decision-Making**: The decision-making processes of AI systems can be complex and not easily understandable, leading to a lack of transparency.
3. **Lack of Public Trust**: The opacity and potential biases in AI systems can result in a lack of trust and understanding from the public.
4. **Safety and Rights Violations**: There is a risk of AI systems violating safety norms and impacting people's rights, opportunities, or safety.

#### Benefits of AI:
1. **Improved Infrastructure**: AI has the potential to transform and improve infrastructure, making it more innovative and efficient.
2. **Enhanced Community Health and Safety**: AI can play a significant role in improving community health, safety, and welfare by ensuring better representation and inclusivity.
3. **Prevention of Harms**: AI can help prevent harms and improve opportunities, rights, and access, particularly for marginalized communities.
4. **Policy Shaping**: AI is central to shaping important policies, such as the Blueprint for an AI Bill of Rights, which aims to protect and enhance the rights of individuals.

### Summary with Arguments For and Against AI

#### Arguments For AI:
- **Innovation and Efficiency**: AI can drive significant advancements in technology and infrastructure, leading to more efficient and innovative solutions.
- **Inclusivity and Representation**: AI can ensure better representation of marginalized voices, leading to more equitable outcomes.
- **Health and Safety**: AI can enhance community health and safety by providing better tools and systems for monitoring and intervention.
- **Policy and Rights**: AI can play a crucial role in shaping policies that protect and enhance individual rights and opportunities.

#### Arguments Against AI:
- **Bias and Discrimination**: The risk of biased data leading to discriminatory outcomes is a significant concern.
- **Transparency and Trust**: The opaque nature of AI decision-making processes can erode public trust and understanding.
- **Safety Risks**: There is a potential for AI systems to violate safety norms and impact people's rights and safety negatively.
- **Complexity of Mitigation**: Mitigating the risks associated with AI requires ongoing transparency, participatory design, and engagement with impacted communities, which can be complex and resource-intensive.

In conclusion, while AI offers numerous benefits, including innovation, improved infrastructure, and enhanced community welfare, it also poses significant risks related to bias, transparency, and safety. Experts argue that a balanced approach, involving robust mitigation strategies and inclusive design, is essential to harness the benefits of AI while minimizing its risks.
### Comparison of Risks and Benefits of AI

#### Risks of AI:
1. **Biased Data and Discriminatory Outcomes**: AI systems can perpetuate and even exacerbate biases present in the data they are trained on, leading to unfair and discriminatory outcomes.
2. **Opaque Decision-Making**: The decision-making processes of AI systems can be complex and not easily understandable, leading to a lack of transparency.
3. **Lack of Public Trust**: The opacity and potential biases in AI systems can result in a lack of trust and understanding from the public.
4. **Safety and Rights Violations**: There is a risk of AI systems violating safety norms and impacting people's rights, opportunities, or safety.

#### Benefits of AI:
1. **Improved Infrastructure**: AI has the potential to transform and improve infrastructure, making it more innovative and efficient.
2. **Enhanced Community Health and Safety**: AI can play a significant role in improving community health, safety, and welfare by ensuring better representation and inclusivity.
3. **Prevention of Harms**: AI can help prevent harms and improve opportunities, rights, and access, particularly for marginalized communities.
4. **Policy Shaping**: AI is central to shaping important policies, such as the Blueprint for an AI Bill of Rights, which aims to protect and enhance the rights of individuals.

### Summary with Arguments For and Against AI

#### Arguments For AI:
- **Innovation and Efficiency**: AI can drive significant advancements in technology and infrastructure, leading to more efficient and innovative solutions.
- **Inclusivity and Representation**: AI can ensure better representation of marginalized voices, leading to more equitable outcomes.
- **Health and Safety**: AI can enhance community health and safety by providing better tools and systems for monitoring and intervention.
- **Policy and Rights**: AI can play a crucial role in shaping policies that protect and enhance individual rights and opportunities.

#### Arguments Against AI:
- **Bias and Discrimination**: The risk of biased data leading to discriminatory outcomes is a significant concern.
- **Transparency and Trust**: The opaque nature of AI decision-making processes can erode public trust and understanding.
- **Safety Risks**: There is a potential for AI systems to violate safety norms and impact people's rights and safety negatively.
- **Complexity of Mitigation**: Mitigating the risks associated with AI requires ongoing transparency, participatory design, and engagement with impacted communities, which can be complex and resource-intensive.

In conclusion, while AI offers numerous benefits, including innovation, improved infrastructure, and enhanced community welfare, it also poses significant risks related to bias, transparency, and safety. Experts argue that a balanced approach, involving robust mitigation strategies and inclusive design, is essential to harness the benefits of AI while minimizing its risks.