ChatGPT 插件集成#

注意: 这是一个正在进行中的工作，敬请期待更多关于这方面的激动人心的更新！

ChatGPT 检索插件集成#

OpenAI ChatGPT 检索插件提供了一个集中的 API 规范，用于任何文档存储系统与 ChatGPT 进行交互。由于这可以部署在任何服务上，这意味着越来越多的文档检索服务将实现这一规范；这不仅允许它们与 ChatGPT 进行交互，还可以与任何使用检索服务的 LLM 工具包进行交互。

LlamaIndex 提供了与 ChatGPT 检索插件的各种集成。

从 LlamaHub 加载数据到 ChatGPT 检索插件#

ChatGPT 检索插件为用户定义了一个 /upsert 终点，用于加载文档。这为 LlamaHub 提供了一个自然的集成点，它提供了来自各种 API 和文档格式的超过 65 个数据加载器。

以下是一个展示如何从 LlamaHub 加载文档到 /upsert 期望的 JSON 格式的示例代码片段：

from llama_index.core import download_loader, Document
from typing import Dict, List
import json

# 下载加载器，加载文档
from llama_index.readers.web import SimpleWebPageReader

loader = SimpleWebPageReader(html_to_text=True)
url = "http://www.paulgraham.com/worked.html"
documents = loader.load_data(urls=[url])


# 将 LlamaIndex 文档转换为 JSON 格式
def dump_docs_to_json(documents: List[Document], out_path: str) -> Dict:
    """将 LlamaIndex 文档转换为 JSON 格式并保存。"""
    result_json = []
    for doc in documents:
        cur_dict = {
            "text": doc.get_text(),
            "id": doc.get_doc_id(),
            # 注意：可以根据需要自定义其他字段
            # 字段取自 https://github.com/openai/chatgpt-retrieval-plugin/tree/main/scripts/process_json#usage
            # "source": ...,
            # "source_id": ...,
            # "url": url,
            # "created_at": ...,
            # "author": "Paul Graham",
        }
        result_json.append(cur_dict)

    json.dump(result_json, open(out_path, "w"))

更多细节，请查看完整示例笔记本。

ChatGPT 检索插件数据加载器#

ChatGPT 检索插件数据加载器可以在 LlamaHub 上访问。

它允许您轻松地从实现插件 API 的任何文档存储中加载数据到 LlamaIndex 数据结构中。

示例代码：

from llama_index.readers.chatgpt_plugin import ChatGPTRetrievalPluginReader
import os

# 加载文档
bearer_token = os.getenv("BEARER_TOKEN")
reader = ChatGPTRetrievalPluginReader(
    endpoint_url="http://localhost:8000", bearer_token=bearer_token
)
documents = reader.load_data("作者在成长过程中做了什么？")

# 构建和查询索引
from llama_index.core import SummaryIndex

index = SummaryIndex.from_documents(documents)
# 将日志级别设置为 DEBUG 以获得更详细的输出
query_engine = vector_index.as_query_engine(response_mode="compact")
response = query_engine.query(
    "总结检索到的内容，并描述作者在成长过程中做了什么",
)

更多细节，请查看完整示例笔记本。

ChatGPT 检索插件索引#

ChatGPT 检索插件索引允许您轻松地在任何文档上构建一个向量索引，存储由实现 ChatGPT 终点的文档存储支持。

注意：这个索引是一个向量索引，允许 top-k 检索。

示例代码：

from llama_index.core.indices.vector_store import ChatGPTRetrievalPluginIndex
from llama_index.core import SimpleDirectoryReader
import os

# 加载文档
documents = SimpleDirectoryReader("../paul_graham_essay/data").load_data()

# 构建索引
bearer_token = os.getenv("BEARER_TOKEN")
# 初始化，不带元数据过滤器
index = ChatGPTRetrievalPluginIndex(
    documents,
    endpoint_url="http://localhost:8000",
    bearer_token=bearer_token,
)

# 查询索引
query_engine = vector_index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",
)
response = query_engine.query("作者在成长过程中做了什么？")

更多细节，请查看完整示例笔记本。