灯笼向量存储¶

在这个笔记本中，我们将展示如何使用Postgresql和Lantern在LlamaIndex中执行向量搜索。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-vector-stores-lantern
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-lantern
%pip install llama-index-embeddings-openai

In [ ]:

Copied!

!pip install psycopg2-binary llama-index asyncpg

!pip install psycopg2-binary llama-index asyncpg

In [ ]:

Copied!





from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.lantern import LanternVectorStore
import textwrap
import openai
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.lantern import LanternVectorStore
import textwrap
import openai

设置OpenAI¶

第一步是配置OpenAI密钥。它将用于为加载到索引中的文档创建嵌入。

In [ ]:

Copied!

import os

os.environ["OPENAI_API_KEY"] = "<your_key>"
openai.api_key = "<your_key>"
import os

os.environ["OPENAI_API_KEY"] = ""
openai.api_key = ""

下载数据

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档¶

使用SimpleDirectoryReader加载存储在data/paul_graham/中的文档。

In [ ]:

Copied!

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)

创建数据库¶

使用已经在本地运行的postgres，创建我们将要使用的数据库。

In [ ]:

Copied!





import psycopg2

connection_string = "postgresql://postgres:postgres@localhost:5432"
db_name = "postgres"
conn = psycopg2.connect(connection_string)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")
import psycopg2

connection_string = "postgresql://postgres:postgres@localhost:5432"
db_name = "postgres"
conn = psycopg2.connect(connection_string)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")

In [ ]:

Copied!

from llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.core import Settings# 使用嵌入模型设置全局设置# 因此查询字符串将被转换为嵌入，并且将使用HNSW索引Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
from llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.core import Settings# 使用嵌入模型设置全局设置# 因此查询字符串将被转换为嵌入，并且将使用HNSW索引Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

创建索引¶

在这里，我们使用之前加载的文档创建一个由Postgres支持的索引。LanternVectorStore需要一些参数。

In [ ]:

Copied!

from sqlalchemy import make_urlurl = make_url(connection_string)vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="paul_graham_essay",    embed_dim=1536,  # openai embedding dimension)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context, show_progress=True)query_engine = index.as_query_engine()
from sqlalchemy import make_urlurl = make_url(connection_string)vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="paul_graham_essay",    embed_dim=1536,  # openai embedding dimension)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context, show_progress=True)query_engine = index.as_query_engine()

查询索引¶

现在我们可以使用我们的索引来提出问题。

In [ ]:

Copied!

response = query_engine.query("What did the author do?")
response = query_engine.query("What did the author do?")

In [ ]:

Copied!

print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))

In [ ]:

Copied!

response = query_engine.query("What happened in the mid 1980s?")
response = query_engine.query("What happened in the mid 1980s?")

In [ ]:

Copied!

print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))

查询现有索引¶

In [ ]:

Copied!

vector_store = LanternVectorStore.from_params(    database=db_name,  # 数据库名称    host=url.host,  # 主机地址    password=url.password,  # 密码    port=url.port,  # 端口    user=url.username,  # 用户名    table_name="paul_graham_essay",  # 表名称    embed_dim=1536,  # openai嵌入维度    m=16,  # HNSW M参数    ef_construction=128,  # HNSW ef构建参数    ef=64,  # HNSW ef搜索参数)# 了解有关HNSW参数的更多信息，请访问：https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.mdindex = VectorStoreIndex.from_vector_store(vector_store=vector_store)query_engine = index.as_query_engine()
vector_store = LanternVectorStore.from_params(    database=db_name,  # 数据库名称    host=url.host,  # 主机地址    password=url.password,  # 密码    port=url.port,  # 端口    user=url.username,  # 用户名    table_name="paul_graham_essay",  # 表名称    embed_dim=1536,  # openai嵌入维度    m=16,  # HNSW M参数    ef_construction=128,  # HNSW ef构建参数    ef=64,  # HNSW ef搜索参数)# 了解有关HNSW参数的更多信息，请访问：https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.mdindex = VectorStoreIndex.from_vector_store(vector_store=vector_store)query_engine = index.as_query_engine()

In [ ]:

Copied!

response = query_engine.query("What did the author do?")
response = query_engine.query("What did the author do?")

In [ ]:

Copied!

print(textwrap.fill(str(response), 100))
print(textwrap.fill(str(response), 100))

混合搜索¶

要启用混合搜索，您需要：

在构建LanternVectorStore时传入hybrid_search=True（并可选择使用所需的语言配置text_search_config）
在构建查询引擎时传入vector_store_query_mode="hybrid"（此配置会在幕后传递给检索器）。您还可以选择设置sparse_top_k来配置从稀疏文本搜索中获取多少结果（默认值与similarity_top_k相同）。

In [ ]:

Copied!

from sqlalchemy import make_urlurl = make_url(connection_string)hybrid_vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="paul_graham_essay_hybrid_search",    embed_dim=1536,  # openai embedding dimension    hybrid_search=True,    text_search_config="english",)storage_context = StorageContext.from_defaults(    vector_store=hybrid_vector_store)hybrid_index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context)
from sqlalchemy import make_urlurl = make_url(connection_string)hybrid_vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="paul_graham_essay_hybrid_search",    embed_dim=1536,  # openai embedding dimension    hybrid_search=True,    text_search_config="english",)storage_context = StorageContext.from_defaults(    vector_store=hybrid_vector_store)hybrid_index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context)

In [ ]:

Copied!





hybrid_query_engine = hybrid_index.as_query_engine(
    vector_store_query_mode="hybrid", sparse_top_k=2
)
hybrid_response = hybrid_query_engine.query(
    "Who does Paul Graham think of with the word schtick"
)
hybrid_query_engine = hybrid_index.as_query_engine(
    vector_store_query_mode="hybrid", sparse_top_k=2
)
hybrid_response = hybrid_query_engine.query(
    "Who does Paul Graham think of with the word schtick"
)

In [ ]:

Copied!

print(hybrid_response)
print(hybrid_response)