跳到主要内容

使用Pinecone对GPT-4进行检索增强

nbviewer

修复产生幻觉的LLMs

在这个笔记本中,我们将学习如何从Pinecone查询与我们的查询相关的上下文,并将这些上下文传递给GPT-4模型,以生成一个由真实数据源支持的答案。

GPT-4相比于之前的OpenAI完成模型有了很大的进步。它还专门使用ChatCompletion端点,因此我们必须以稍微不同的方式使用它。然而,这个模型的强大之处使得这种改变是值得的,特别是当与Pinecone向量数据库等外部知识库相结合时。

本笔记本所需的安装包括:

!pip install -qU bs4 tiktoken openai langchain pinecone-client[grpc]

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.7 MB ? eta -:--:--     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.7/1.7 MB 71.4 MB/s eta 0:00:01     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 41.5 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.1/70.1 KB 6.5 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 396.0/396.0 KB 28.4 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.2/177.2 KB 12.1 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 KB 4.8 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 4.8 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 KB 8.0 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 43.0 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 77.1 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.8/158.8 KB 19.6 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.2/199.2 KB 26.0 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 264.6/264.6 KB 35.1 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.2/114.2 KB 15.6 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 KB 7.7 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 218.0/218.0 KB 27.4 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 218.0/218.0 KB 28.7 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.7/211.7 KB 12.0 MB/s eta 0:00:00
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-cloud-translate 3.8.4 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-cloud-language 2.6.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-cloud-firestore 2.7.3 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-cloud-datastore 2.11.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-cloud-bigquery 3.4.2 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-cloud-bigquery-storage 2.19.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.
google-api-core 2.11.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.

准备数据

在这个例子中,我们将从langchain.readthedocs.io/下载LangChain文档。我们可以这样获取该网站上所有.html文件:

!wget -r -A.html -P rtdocs https://python.langchain.com/en/latest/

<Response [200]>

这将所有HTML下载到rtdocs目录中。现在我们可以使用LangChain本身来处理这些文档。我们可以这样使用ReadTheDocsLoader

from langchain.document_loaders import ReadTheDocsLoader

loader = ReadTheDocsLoader('rtdocs')
docs = loader.load()
len(docs)

.rst .pdf Welcome to LangChain Contents Getting Started Modules Use Cases Reference Docs LangChain Ecosystem Additional Resources Welcome to LangChain# Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you are able to combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include: ❓ Question Answering over specific documents Documentation End-to-end Example: Question Answering over Notion Database 💬 Chatbots Documentation End-to-end Example: Chat-LangChain 🤖 Agents Documentation End-to-end Example: GPT+WolframAlpha Getting Started# Checkout the below guide for a walkthrough of how to get started using LangChain to create an Language Model application. Getting Started Documentation Modules# There are several main modules that LangChain provides support for. For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides. These modules are, in increasing order of complexity: Prompts: This includes prompt management, prompt optimization, and prompt serialization. LLMs: This includes a generic interface for all LLMs, and common utilities for working with LLMs. Document Loaders: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources. Utils: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application. Chains: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. Indexes: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that. Agents: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents. Memory: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory. Chat: Chat models are a variation on Language Models that expose a different API - rather than working with raw text, they work with messages. LangChain provides a standard interface for working with them and doing all the same things as above. Use Cases# The above modules can be used in a variety of ways. LangChain also provides guidance and assistance in this. Below are some of the common use cases LangChain supports. Agents: Agents are systems that use a language model to interact with other tools. These can be used to do more grounded question/answering, interact with APIs, or even take actions. Chatbots: Since language models are good at producing text, that makes them ideal for creating chatbots. Data Augmented Generation: Data Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources. Question Answering: Answering questions over specific documents, only utilizing the information in those documents to construct an answer. A type of Data Augmented Generation. Summarization: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation. Evaluation: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this. Generate similar examples: Generating similar examples to a given input. This is a common use case for many applications, and LangChain provides some prompts/chains for assisting in this. Compare models: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so. Reference Docs# All of LangChain’s reference documentation, in one place. Full documentation on all methods, classes, installation methods, and integration setups for LangChain. Reference Documentation LangChain Ecosystem# Guides for how other companies/products can be used with LangChain LangChain Ecosystem Additional Resources# Additional collection of resources we think may be useful as you develop your application! LangChainHub: The LangChainHub is a place to share and explore other prompts, chains, and agents. Glossary: A glossary of all related terms, papers, methods, etc. Whether implemented in LangChain or not! Gallery: A collection of our favorite projects that use LangChain. Useful for finding inspiration or seeing how things were done in other applications. Deployments: A collection of instructions, code snippets, and template repositories for deploying LangChain apps. Discord: Join us on our Discord to discuss all things LangChain! Tracing: A guide on using tracing in LangChain to visualize the execution of chains and agents. Production Support: As you move your LangChains into production, we’d love to offer more comprehensive support. Please fill out this form and we’ll set up a dedicated support Slack channel. next Quickstart Guide Contents Getting Started Modules Use Cases Reference Docs LangChain Ecosystem Additional Resources By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.

这样我们就得到了数百个处理过的文档页面。让我们来看看每个页面包含的格式:

docs[0]

我们可以这样访问纯文本页面内容:

print(docs[0].page_content)

print(docs[5].page_content)

我们还可以找到每个文档的来源:

docs[5].metadata['source'].replace('rtdocs/', 'https://')

我们可以使用这些来创建我们的data列表:

data = []

for doc in docs:
data.append({
'url': doc.metadata['source'].replace('rtdocs/', 'https://'),
'text': doc.page_content
})

data[3]

{'url': 'https://langchain.readthedocs.io/en/latest/modules/memory/types/entity_summary_memory.html',
'text': '.ipynb .pdf Entity Memory Contents Using in a chain Inspecting the memory store Entity Memory# This notebook shows how to work with a memory module that remembers things about specific entities. It extracts information on entities (using LLMs) and builds up its knowledge about that entity over time (also using LLMs). Let’s first walk through using this functionality. from langchain.llms import OpenAI from langchain.memory import ConversationEntityMemory llm = OpenAI(temperature=0) memory = ConversationEntityMemory(llm=llm) _input = {"input": "Deven & Sam are working on a hackathon project"} memory.load_memory_variables(_input) memory.save_context( _input, {"ouput": " That sounds like a great project! What kind of project are they working on?"} ) memory.load_memory_variables({"input": \'who is Sam\'}) {\'history\': \'Human: Deven & Sam are working on a hackathon project\\nAI: That sounds like a great project! What kind of project are they working on?\', \'entities\': {\'Sam\': \'Sam is working on a hackathon project with Deven.\'}} memory = ConversationEntityMemory(llm=llm, return_messages=True) _input = {"input": "Deven & Sam are working on a hackathon project"} memory.load_memory_variables(_input) memory.save_context( _input, {"ouput": " That sounds like a great project! What kind of project are they working on?"} ) memory.load_memory_variables({"input": \'who is Sam\'}) {\'history\': [HumanMessage(content=\'Deven & Sam are working on a hackathon project\', additional_kwargs={}), AIMessage(content=\' That sounds like a great project! What kind of project are they working on?\', additional_kwargs={})], \'entities\': {\'Sam\': \'Sam is working on a hackathon project with Deven.\'}} Using in a chain# Let’s now use it in a chain! from langchain.chains import ConversationChain from langchain.memory import ConversationEntityMemory from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE from pydantic import BaseModel from typing import List, Dict, Any conversation = ConversationChain( llm=llm, verbose=True, prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE, memory=ConversationEntityMemory(llm=llm) ) conversation.predict(input="Deven & Sam are working on a hackathon project") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Deven\': \'\', \'Sam\': \'\'} Current conversation: Last line: Human: Deven & Sam are working on a hackathon project You: > Finished chain. \' That sounds like a great project! What kind of project are they working on?\' conversation.memory.store {\'Deven\': \'Deven is working on a hackathon project with Sam.\', \'Sam\': \'Sam is working on a hackathon project with Deven.\'} conversation.predict(input="They are trying to add more complex memory structures to Langchain") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Deven\': \'Deven is working on a hackathon project with Sam.\', \'Sam\': \'Sam is working on a hackathon project with Deven.\', \'Langchain\': \'\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Last line: Human: They are trying to add more complex memory structures to Langchain You: > Finished chain. \' That sounds like an interesting project! What kind of memory structures are they trying to add?\' conversation.predict(input="They are adding in a key-value store for entities mentioned so far in the conversation.") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Deven\': \'Deven is working on a hackathon project with Sam, attempting to add more complex memory structures to Langchain.\', \'Sam\': \'Sam is working on a hackathon project with Deven, trying to add more complex memory structures to Langchain.\', \'Langchain\': \'Langchain is a project that is trying to add more complex memory structures.\', \'Key-Value Store\': \'\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Last line: Human: They are adding in a key-value store for entities mentioned so far in the conversation. You: > Finished chain. \' That sounds like a great idea! How will the key-value store work?\' conversation.predict(input="What do you know about Deven & Sam?") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Deven\': \'Deven is working on a hackathon project with Sam, attempting to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\', \'Sam\': \'Sam is working on a hackathon project with Deven, trying to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Last line: Human: What do you know about Deven & Sam? You: > Finished chain. \' Deven and Sam are working on a hackathon project together, attempting to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\' Inspecting the memory store# We can also inspect the memory store directly. In the following examaples, we look at it directly, and then go through some examples of adding information and watch how it changes. from pprint import pprint pprint(conversation.memory.store) {\'Deven\': \'Deven is working on a hackathon project with Sam, attempting to add \' \'more complex memory structures to Langchain, including a key-value \' \'store for entities mentioned so far in the conversation.\', \'Key-Value Store\': \'A key-value store that stores entities mentioned in the \' \'conversation.\', \'Langchain\': \'Langchain is a project that is trying to add more complex \' \'memory structures, including a key-value store for entities \' \'mentioned so far in the conversation.\', \'Sam\': \'Sam is working on a hackathon project with Deven, attempting to add \' \'more complex memory structures to Langchain, including a key-value \' \'store for entities mentioned so far in the conversation.\'} conversation.predict(input="Sam is the founder of a company called Daimon.") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Daimon\': \'\', \'Sam\': \'Sam is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\'} Current conversation: Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Human: What do you know about Deven & Sam? AI: Deven and Sam are working on a hackathon project to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. They seem to be very motivated and passionate about their project, and are working hard to make it a success. Last line: Human: Sam is the founder of a company called Daimon. You: > Finished chain. "\\nThat\'s impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?" from pprint import pprint pprint(conversation.memory.store) {\'Daimon\': \'Daimon is a company founded by Sam.\', \'Deven\': \'Deven is working on a hackathon project with Sam to add more \' \'complex memory structures to Langchain, including a key-value store \' \'for entities mentioned so far in the conversation.\', \'Key-Value Store\': \'Key-Value Store: A data structure that stores values \' \'associated with a unique key, allowing for efficient \' \'retrieval of values. Deven and Sam are adding a key-value \' \'store for entities mentioned so far in the conversation.\', \'Langchain\': \'Langchain is a project that seeks to add more complex memory \' \'structures, including a key-value store for entities mentioned \' \'so far in the conversation.\', \'Sam\': \'Sam is working on a hackathon project with Deven to add more complex \' \'memory structures to Langchain, including a key-value store for \' \'entities mentioned so far in the conversation. He is also the founder \' \'of a company called Daimon.\'} conversation.predict(input="What do you know about Sam?") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\'Sam\': \'Sam is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He is also the founder of a company called Daimon.\', \'Daimon\': \'Daimon is a company founded by Sam.\'} Current conversation: Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Human: What do you know about Deven & Sam? AI: Deven and Sam are working on a hackathon project to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. They seem to be very motivated and passionate about their project, and are working hard to make it a success. Human: Sam is the founder of a company called Daimon. AI: That\'s impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon? Last line: Human: What do you know about Sam? You: > Finished chain. \' Sam is the founder of a company called Daimon. He is also working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He seems to be very motivated and passionate about his project, and is working hard to make it a success.\' previous ConversationBufferWindowMemory next Conversation Knowledge Graph Memory Contents Using in a chain Inspecting the memory store By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.'}

虽然看起来有点丑,但现在已经足够好了。让我们看看如何处理所有这些内容。我们将把所有内容分成大约400个标记的块,我们可以很容易地使用langchaintiktoken来实现这一点:

import tiktoken

tokenizer = tiktoken.get_encoding('p50k_base')

# 创建长度函数
def tiktoken_len(text):
tokens = tokenizer.encode(
text,
disallowed_special=()
)
return len(tokens)

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=20,
length_function=tiktoken_len,
separators=["\n\n", "\n", " ", ""]
)

使用这种方法将data处理成更多的块。

from uuid import uuid4
from tqdm.auto import tqdm

chunks = []

for idx, record in enumerate(tqdm(data)):
texts = text_splitter.split_text(record['text'])
chunks.extend([{
'id': str(uuid4()),
'text': texts[i],
'chunk': i,
'url': record['url']
} for i in range(len(texts))])

  0%|          | 0/231 [00:00<?, ?it/s]

我们的块已经准备好了,现在我们开始嵌入和索引所有内容。

初始化嵌入模型

我们使用text-embedding-3-small作为嵌入模型。我们可以这样嵌入文本:

import openai

# 初始化OpenAI API密钥
openai.api_key = "sk-..."

embed_model = "text-embedding-3-small"

res = openai.Embedding.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], engine=embed_model
)

在响应res中,我们将在'data'字段中找到一个包含我们新嵌入向量的类似JSON的对象。

res.keys()

dict_keys(['object', 'data', 'model', 'usage'])

'data'中,我们会找到两条记录,分别对应我们刚刚嵌入的两个句子。每个向量嵌入包含1536个维度(即text-embedding-3-small模型的输出维度)。

len(res['data'])

2
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])

(1536, 1536)

我们将把相同的嵌入逻辑应用到我们刚刚抓取的langchain文档数据集中。但在此之前,我们必须创建一个存储嵌入的位置。

初始化索引

现在我们需要一个地方来存储这些嵌入向量,并通过它们进行高效的向量搜索。为此,我们使用Pinecone,可以在此处获取免费的API密钥,并在下面输入该密钥,我们将初始化与Pinecone的连接并创建一个新的索引。

import pinecone

index_name = 'gpt-4-langchain-docs'

# 初始化与 Pinecone 的连接
pinecone.init(
api_key="PINECONE_API_KEY", # app.pinecone.io(控制台)
environment="PINECONE_ENVIRONMENT" # 在控制台中,API密钥的旁边
)

# check if index already exists (it shouldn't if this is first time)
if index_name not in pinecone.list_indexes():
# if does not exist, create index
pinecone.create_index(
index_name,
dimension=len(res['data'][0]['embedding']),
metric='dotproduct'
)
# 连接到索引
index = pinecone.GRPCIndex(index_name)
# 查看索引统计信息
index.describe_index_stats()

{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}

我们可以看到索引目前是空的,total_vector_count0。我们可以开始使用OpenAI text-embedding-3-small构建的嵌入来填充它,就像这样:

from tqdm.auto import tqdm
import datetime
from time import sleep

batch_size = 100 # 我们一次创建和插入多少个嵌入?

for i in tqdm(range(0, len(chunks), batch_size)):
# 查找批处理结束位置
i_end = min(len(chunks), i+batch_size)
meta_batch = chunks[i:i_end]
# 获取ID
ids_batch = [x['id'] for x in meta_batch]
# 获取待编码文本
texts = [x['text'] for x in meta_batch]
# 创建嵌入(添加了try-except以避免RateLimitError)
try:
res = openai.Embedding.create(input=texts, engine=embed_model)
except:
done = False
while not done:
sleep(5)
try:
res = openai.Embedding.create(input=texts, engine=embed_model)
done = True
except:
pass
embeds = [record['embedding'] for record in res['data']]
# 清理元数据
meta_batch = [{
'text': x['text'],
'chunk': x['chunk'],
'url': x['url']
} for x in meta_batch]
to_upsert = list(zip(ids_batch, embeds, meta_batch))
# 插入或更新到 Pinecone
index.upsert(vectors=to_upsert)

  0%|          | 0/12 [00:00<?, ?it/s]

现在我们已经将所有的langchain文档添加到索引中了。有了这个,我们可以继续进行检索,然后使用GPT-4进行答案生成。

检索

要搜索我们的文档,首先需要创建一个查询向量 xq。使用 xq,我们将从LangChain文档中检索出最相关的片段,就像这样:

query = "how do I use the LLMChain in LangChain?"

res = openai.Embedding.create(
input=[query],
engine=embed_model
)

# 从Pinecone检索
xq = res['data'][0]['embedding']

# 获取相关上下文(包括问题)
res = index.query(xq, top_k=5, include_metadata=True)

res

{'matches': [{'id': '1fec660b-9937-4f7e-9692-280c8cc7ce0d',
'metadata': {'chunk': 0.0,
'text': '.rst .pdf Chains Chains# Using an LLM in '
'isolation is fine for some simple '
'applications, but many more complex ones '
'require chaining LLMs - either with each '
'other or with other experts. LangChain '
'provides a standard interface for Chains, '
'as well as some common implementations of '
'chains for ease of use. The following '
'sections of documentation are provided: '
'Getting Started: A getting started guide '
'for chains, to get you up and running '
'quickly. Key Concepts: A conceptual guide '
'going over the various concepts related to '
'chains. How-To Guides: A collection of '
'how-to guides. These highlight how to use '
'various types of chains. Reference: API '
'reference documentation for all Chain '
'classes. previous Vector DB Text '
'Generation next Getting Started By '
'Harrison Chase © Copyright 2022, Harrison '
'Chase. Last updated on Mar 15, 2023.',
'url': 'https://langchain.readthedocs.io/en/latest/modules/chains.html'},
'score': 0.8848499,
'sparse_values': {'indices': [], 'values': []},
'values': []},
{'id': 'fe48438d-228a-4e0e-b41e-5cb5c6ba1482',
'metadata': {'chunk': 0.0,
'text': '.rst .pdf LLMs LLMs# Large Language Models '
'(LLMs) are a core component of LangChain. '
'LangChain is not a provider of LLMs, but '
'rather provides a standard interface '
'through which you can interact with a '
'variety of LLMs. The following sections of '
'documentation are provided: Getting '
'Started: An overview of all the '
'functionality the LangChain LLM class '
'provides. Key Concepts: A conceptual guide '
'going over the various concepts related to '
'LLMs. How-To Guides: A collection of '
'how-to guides. These highlight how to '
'accomplish various objectives with our LLM '
'class, as well as how to integrate with '
'various LLM providers. Reference: API '
'reference documentation for all LLM '
'classes. previous Example Selector next '
'Getting Started By Harrison Chase © '
'Copyright 2022, Harrison Chase. Last '
'updated on Mar 15, 2023.',
'url': 'https://langchain.readthedocs.io/en/latest/modules/llms.html'},
'score': 0.8595519,
'sparse_values': {'indices': [], 'values': []},
'values': []},
{'id': '60df5bff-5f79-46ee-9456-534d42f6a94e',
'metadata': {'chunk': 0.0,
'text': '.ipynb .pdf Getting Started Contents Why '
'do we need chains? Query an LLM with the '
'LLMChain Combine chains with the '
'SequentialChain Create a custom chain with '
'the Chain class Getting Started# In this '
'tutorial, we will learn about creating '
'simple chains in LangChain. We will learn '
'how to create a chain, add components to '
'it, and run it. In this tutorial, we will '
'cover: Using a simple LLM chain Creating '
'sequential chains Creating a custom chain '
'Why do we need chains?# Chains allow us to '
'combine multiple components together to '
'create a single, coherent application. For '
'example, we can create a chain that takes '
'user input, formats it with a '
'PromptTemplate, and then passes the '
'formatted response to an LLM. We can build '
'more complex chains by combining multiple '
'chains together, or by combining chains '
'with other components. Query an LLM with '
'the LLMChain# The LLMChain is a simple '
'chain that takes in a prompt template, '
'formats it with the user input and returns '
'the response from an LLM. To use the '
'LLMChain, first create a prompt template. '
'from langchain.prompts import '
'PromptTemplate from langchain.llms import '
'OpenAI llm = OpenAI(temperature=0.9) '
'prompt = PromptTemplate( '
'input_variables=["product"], '
'template="What is a good',
'url': 'https://langchain.readthedocs.io/en/latest/modules/chains/getting_started.html'},
'score': 0.8462403,
'sparse_values': {'indices': [], 'values': []},
'values': []},
{'id': '2f11beb1-3935-447e-b565-b20383dc4544',
'metadata': {'chunk': 1.0,
'text': 'chain first uses a LLM to construct the '
'url to hit, then makes that request with '
'the Requests wrapper, and finally runs '
'that result through the language model '
'again in order to product a natural '
'language response. Example Notebook '
'LLMBash Chain Links Used: BashProcess, '
'LLMChain Notes: This chain takes user '
'input (a question), uses an LLM chain to '
'convert it to a bash command to run in the '
'terminal, and then returns that as the '
'result. Example Notebook LLMChecker Chain '
'Links Used: LLMChain Notes: This chain '
'takes user input (a question), uses an LLM '
'chain to answer that question, and then '
'uses other LLMChains to self-check that '
'answer. Example Notebook LLMRequests Chain '
'Links Used: Requests, LLMChain Notes: This '
'chain takes a URL and other inputs, uses '
'Requests to get the data at that URL, and '
'then passes that along with the other '
'inputs into an LLMChain to generate a '
'response. The example included shows how '
'to ask a question to Google - it firsts '
'constructs a Google url, then fetches the '
'data there, then passes that data + the '
'original question into an LLMChain to get '
'an answer. Example Notebook Moderation '
'Chain Links Used: LLMChain, '
'ModerationChain Notes: This chain shows '
'how to use OpenAI’s content',
'url': 'https://langchain.readthedocs.io/en/latest/modules/chains/utility_how_to.html'},
'score': 0.8451743,
'sparse_values': {'indices': [], 'values': []},
'values': []},
{'id': 'f3ed41eb-063c-407f-bdaa-706a8c6a2091',
'metadata': {'chunk': 1.0,
'text': 'Prompts: This includes prompt management, '
'prompt optimization, and prompt '
'serialization. LLMs: This includes a '
'generic interface for all LLMs, and common '
'utilities for working with LLMs. Document '
'Loaders: This includes a standard '
'interface for loading documents, as well '
'as specific integrations to all types of '
'text data sources. Utils: Language models '
'are often more powerful when interacting '
'with other sources of knowledge or '
'computation. This can include Python '
'REPLs, embeddings, search engines, and '
'more. LangChain provides a large '
'collection of common utils to use in your '
'application. Chains: Chains go beyond just '
'a single LLM call, and are sequences of '
'calls (whether to an LLM or a different '
'utility). LangChain provides a standard '
'interface for chains, lots of integrations '
'with other tools, and end-to-end chains '
'for common applications. Indexes: Language '
'models are often more powerful when '
'combined with your own text data - this '
'module covers best practices for doing '
'exactly that. Agents: Agents involve an '
'LLM making decisions about which Actions '
'to take, taking that Action, seeing an '
'Observation, and repeating that until '
'done. LangChain provides a standard '
'interface for agents, a selection of '
'agents to choose from, and examples of end '
'to end agents. Memory: Memory is the',
'url': 'https://langchain.readthedocs.io/en/latest/'},
'score': 0.84271824,
'sparse_values': {'indices': [], 'values': []},
'values': []}],
'namespace': ''}

检索完成后,我们将继续将这些内容输入到GPT-4中,以生成答案。

检索增强生成

GPT-4目前通过OpenAI的ChatCompletions端点进行访问。为了将我们检索到的信息添加到模型中,我们需要将其传递到我们的用户提示中,并与我们的原始查询一起。我们可以这样做:

# 获取检索到的文本列表
contexts = [item['metadata']['text'] for item in res['matches']]

augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query

print(augmented_query)

.rst .pdf Chains Chains# Using an LLM in isolation is fine for some simple applications, but many more complex ones require chaining LLMs - either with each other or with other experts. LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use. The following sections of documentation are provided: Getting Started: A getting started guide for chains, to get you up and running quickly. Key Concepts: A conceptual guide going over the various concepts related to chains. How-To Guides: A collection of how-to guides. These highlight how to use various types of chains. Reference: API reference documentation for all Chain classes. previous Vector DB Text Generation next Getting Started By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.

---

.rst .pdf LLMs LLMs# Large Language Models (LLMs) are a core component of LangChain. LangChain is not a provider of LLMs, but rather provides a standard interface through which you can interact with a variety of LLMs. The following sections of documentation are provided: Getting Started: An overview of all the functionality the LangChain LLM class provides. Key Concepts: A conceptual guide going over the various concepts related to LLMs. How-To Guides: A collection of how-to guides. These highlight how to accomplish various objectives with our LLM class, as well as how to integrate with various LLM providers. Reference: API reference documentation for all LLM classes. previous Example Selector next Getting Started By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.

---

.ipynb .pdf Getting Started Contents Why do we need chains? Query an LLM with the LLMChain Combine chains with the SequentialChain Create a custom chain with the Chain class Getting Started# In this tutorial, we will learn about creating simple chains in LangChain. We will learn how to create a chain, add components to it, and run it. In this tutorial, we will cover: Using a simple LLM chain Creating sequential chains Creating a custom chain Why do we need chains?# Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components. Query an LLM with the LLMChain# The LLMChain is a simple chain that takes in a prompt template, formats it with the user input and returns the response from an LLM. To use the LLMChain, first create a prompt template. from langchain.prompts import PromptTemplate from langchain.llms import OpenAI llm = OpenAI(temperature=0.9) prompt = PromptTemplate( input_variables=["product"], template="What is a good

---

chain first uses a LLM to construct the url to hit, then makes that request with the Requests wrapper, and finally runs that result through the language model again in order to product a natural language response. Example Notebook LLMBash Chain Links Used: BashProcess, LLMChain Notes: This chain takes user input (a question), uses an LLM chain to convert it to a bash command to run in the terminal, and then returns that as the result. Example Notebook LLMChecker Chain Links Used: LLMChain Notes: This chain takes user input (a question), uses an LLM chain to answer that question, and then uses other LLMChains to self-check that answer. Example Notebook LLMRequests Chain Links Used: Requests, LLMChain Notes: This chain takes a URL and other inputs, uses Requests to get the data at that URL, and then passes that along with the other inputs into an LLMChain to generate a response. The example included shows how to ask a question to Google - it firsts constructs a Google url, then fetches the data there, then passes that data + the original question into an LLMChain to get an answer. Example Notebook Moderation Chain Links Used: LLMChain, ModerationChain Notes: This chain shows how to use OpenAI’s content

---

Prompts: This includes prompt management, prompt optimization, and prompt serialization. LLMs: This includes a generic interface for all LLMs, and common utilities for working with LLMs. Document Loaders: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources. Utils: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application. Chains: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. Indexes: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that. Agents: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents. Memory: Memory is the

-----

how do I use the LLMChain in LangChain?

现在我们提出问题:

# system message to 'prime' the model
primer = f"""You are Q&A bot. A highly intelligent system that answers
user questions based on the information provided by the user above
each question. If the information can not be found in the information
provided by the user you truthfully say "I don't know".
"""

res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": primer},
{"role": "user", "content": augmented_query}
]
)

为了漂亮地显示这个回复,我们将以markdown格式显示它。

from IPython.display import Markdown

display(Markdown(res['choices'][0]['message']['content']))

To use the LLMChain in LangChain, follow these steps:

  1. Import the necessary classes:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain
  1. Create an instance of the LLM and set the configuration options:
llm = OpenAI(temperature=0.9)
  1. Create a PromptTemplate instance with the input variables and the template:
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good product for {product}?",
)
  1. Create an LLMChain instance by passing the LLM and PromptTemplate instances:
llm_chain = LLMChain(llm=llm, prompt_template=prompt)
  1. Run the LLMChain with user input:
response = llm_chain.run({"product": "software development"})
  1. Access the generated response:
generated_text = response["generated_text"]

In this example, the LLMChain is used to generate a response by passing through the user input and formatting it using the prompt template. The response is then obtained from the LLM instance (in this case, OpenAI), and the generated text can be accessed from the response dictionary.

让我们将其与未增强的查询进行比较…

res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": primer},
{"role": "user", "content": query}
]
)
display(Markdown(res['choices'][0]['message']['content']))

I don’t know.

如果我们去掉primer中的"I don't know"部分呢?

res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are Q&A bot. A highly intelligent system that answers user questions"},
{"role": "user", "content": query}
]
)
display(Markdown(res['choices'][0]['message']['content']))

LangChain hasn’t provided any public documentation on LLMChain, nor is there a known technology called LLMChain in their library. To better assist you, please provide more information or context about LLMChain and LangChain.

Meanwhile, if you are referring to LangChain, a blockchain-based decentralized AI language model, you can start by visiting their official website (if they have one), exploring their available resources, such as documentation and tutorials, and following any instructions on setting up their technology.

If you are looking for help with a specific language chain or model in natural language processing, consider rephrasing your question to provide more accurate information or visit relevant resources like GPT-3 or other NLP-related documentation.