PostgresML 管理索引¶

在本笔记本中，我们将展示如何使用 PostgresML 与 LlamaIndex。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index-indices-managed-postgresml
!pip install llama-index-indices-managed-postgresml

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from llama_index.indices.managed.postgresml import PostgresMLIndexfrom llama_index.core import SimpleDirectoryReader# 在notebook中，由于asyncio可能会出现问题，需要进行以下设置以防止事件循环错误import nest_asyncionest_asyncio.apply()
from llama_index.indices.managed.postgresml import PostgresMLIndexfrom llama_index.core import SimpleDirectoryReader# 在notebook中，由于asyncio可能会出现问题，需要进行以下设置以防止事件循环错误import nest_asyncionest_asyncio.apply()

加载文档¶

加载 paul_graham_essay.txt 文档。

In [ ]:

Copied!

!mkdir data
!curl -o data/paul_graham_essay.txt https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
!mkdir data
!curl -o data/paul_graham_essay.txt https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt

In [ ]:

Copied!

documents = SimpleDirectoryReader("data").load_data()
print(f"documents loaded into {len(documents)} document objects")
print(f"Document ID of first doc is {documents[0].doc_id}")
documents = SimpleDirectoryReader("data").load_data()
print(f"documents loaded into {len(documents)} document objects")
print(f"Document ID of first doc is {documents[0].doc_id}")

将文档插入到您的PostgresML数据库中¶

首先，让我们将url设置为我们的PostgresML数据库。如果您还没有url，可以在此处免费创建一个：https://postgresml.org/signup

In [ ]:

Copied!

# 让我们设置一些需要的密钥from google.colab import userdataPGML_DATABASE_URL = userdata.get("PGML_DATABASE_URL")# 如果你没有设置这些密钥，取消注释下面的行并运行它们# 确保用你自己的密钥替换 {REPLACE_ME}# PGML_DATABASE_URL = "{REPLACE_ME}"
# 让我们设置一些需要的密钥from google.colab import userdataPGML_DATABASE_URL = userdata.get("PGML_DATABASE_URL")# 如果你没有设置这些密钥，取消注释下面的行并运行它们# 确保用你自己的密钥替换 {REPLACE_ME}# PGML_DATABASE_URL = "{REPLACE_ME}"

In [ ]:

Copied!





index = PostgresMLIndex.from_documents(
    documents,
    collection_name="llama-index-example-demo",
    pgml_database_url=PGML_DATABASE_URL,
)
index = PostgresMLIndex.from_documents(
    documents,
    collection_name="llama-index-example-demo",
    pgml_database_url=PGML_DATABASE_URL,
)

查询Postgresml索引¶

现在我们可以使用PostgresMLIndex检索器来提出问题。

In [ ]:

Copied!

query = "What did the author write about?"
query = "What did the author write about?"

首先我们使用检索器列出返回的文档：

In [ ]:

Copied!





retriever = index.as_retriever()
response = retriever.retrieve(query)
texts = [t.node.text for t in response]

print("The Nodes:")
print(response)
print("\nThe Texts")
print(texts)
retriever = index.as_retriever()
response = retriever.retrieve(query)
texts = [t.node.text for t in response]

print("The Nodes:")
print(response)
print("\nThe Texts")
print(texts)

使用as_query_engine()，我们可以在一个查询中提出问题并获得响应：

In [ ]:

Copied!





query_engine = index.as_query_engine()
response = query_engine.query(query)

print("The Response:")
print(response)
print("\nThe Source Nodes:")
print(response.get_formatted_sources())
query_engine = index.as_query_engine()
response = query_engine.query(query)

print("The Response:")
print(response)
print("\nThe Source Nodes:")
print(response.get_formatted_sources())

请注意，上面的“response”对象包括摘要文本以及用于提供此响应的源文档（引用）。请注意，所有的源节点都来自同一篇文档。这是因为我们只上传了一个文档，PostgresML在嵌入之前自动将其拆分。所有参数都可以进行控制。更多信息请参阅文档。

当我们创建查询引擎时，可以通过传递 streaming=True 来启用流式处理。

注意：由于谷歌协作平台的互联网连接速度较慢，流式处理非常缓慢。

In [ ]:

Copied!





query_engine = index.as_query_engine(streaming=True)
results = query_engine.query(query)
for text in results.response_gen:
    print(text, end="", flush=True)
query_engine = index.as_query_engine(streaming=True)
results = query_engine.query(query)
for text in results.response_gen:
    print(text, end="", flush=True)