灯笼向量存储（自动检索器）¶

本指南展示了如何在LlamaIndex中执行自动检索。

许多流行的向量数据库除了语义搜索的查询字符串外，还支持一组元数据过滤器。给定一个自然语言查询，我们首先使用LLM推断一组元数据过滤器以及传递给向量数据库的正确查询字符串（也可以为空）。然后对整个查询包进行针对向量数据库的执行。

这允许进行比top-k语义搜索更动态、更有表现力的检索形式。对于给定查询的相关上下文，可能只需要在元数据标签上进行过滤，或者需要在过滤集合内进行过滤+语义搜索的联合组合，或者只需要进行原始的语义搜索。

我们以灯笼为例进行演示，但自动检索也已在许多其他向量数据库中实现（例如Pinecone、Chroma、Weaviate等）。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-vector-stores-lantern
%pip install llama-index-vector-stores-lantern

In [ ]:

Copied!

!pip install llama-index psycopg2-binary asyncpg
!pip install llama-index psycopg2-binary asyncpg

In [ ]:

Copied!

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [ ]:

Copied!

# 设置OpenAIimport osos.environ["OPENAI_API_KEY"] = "<your-api-key>"import openaiopenai.api_key = os.environ["OPENAI_API_KEY"]
# 设置OpenAIimport osos.environ["OPENAI_API_KEY"] = ""import openaiopenai.api_key = os.environ["OPENAI_API_KEY"]

In [ ]:

Copied!

import psycopg2
from sqlalchemy import make_url

connection_string = "postgresql://postgres:postgres@localhost:5432"

url = make_url(connection_string)

db_name = "postgres"
conn = psycopg2.connect(connection_string)
conn.autocommit = True
import psycopg2
from sqlalchemy import make_url

connection_string = "postgresql://postgres:postgres@localhost:5432"

url = make_url(connection_string)

db_name = "postgres"
conn = psycopg2.connect(connection_string)
conn.autocommit = True

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.lantern import LanternVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.lantern import LanternVectorStore

In [ ]:

Copied!





from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text=(
            "Michael Jordan is a retired professional basketball player,"
            " widely regarded as one of the greatest basketball players of all"
            " time."
        ),
        metadata={
            "category": "Sports",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Angelina Jolie is an American actress, filmmaker, and"
            " humanitarian. She has received numerous awards for her acting"
            " and is known for her philanthropic work."
        ),
        metadata={
            "category": "Entertainment",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Elon Musk is a business magnate, industrial designer, and"
            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
            " Tesla, Inc., Neuralink, and The Boring Company."
        ),
        metadata={
            "category": "Business",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
            " has achieved significant success in the music industry and is"
            " known for her versatile musical style."
        ),
        metadata={
            "category": "Music",
            "country": "Barbados",
        },
    ),
    TextNode(
        text=(
            "Cristiano Ronaldo is a Portuguese professional footballer who is"
            " considered one of the greatest football players of all time. He"
            " has won numerous awards and set multiple records during his"
            " career."
        ),
        metadata={
            "category": "Sports",
            "country": "Portugal",
        },
    ),
]
from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text=(
            "Michael Jordan is a retired professional basketball player,"
            " widely regarded as one of the greatest basketball players of all"
            " time."
        ),
        metadata={
            "category": "Sports",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Angelina Jolie is an American actress, filmmaker, and"
            " humanitarian. She has received numerous awards for her acting"
            " and is known for her philanthropic work."
        ),
        metadata={
            "category": "Entertainment",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Elon Musk is a business magnate, industrial designer, and"
            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
            " Tesla, Inc., Neuralink, and The Boring Company."
        ),
        metadata={
            "category": "Business",
            "country": "United States",
        },
    ),
    TextNode(
        text=(
            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
            " has achieved significant success in the music industry and is"
            " known for her versatile musical style."
        ),
        metadata={
            "category": "Music",
            "country": "Barbados",
        },
    ),
    TextNode(
        text=(
            "Cristiano Ronaldo is a Portuguese professional footballer who is"
            " considered one of the greatest football players of all time. He"
            " has won numerous awards and set multiple records during his"
            " career."
        ),
        metadata={
            "category": "Sports",
            "country": "Portugal",
        },
    ),
]

使用 Lantern Vector Store 构建向量索引¶

在这里，我们将数据加载到向量存储中。如上所述，每个节点的文本和元数据都将转换为 Lantern 中相应的表示。现在我们可以从 Lantern 对这些数据运行语义查询，也可以进行元数据过滤。

In [ ]:

Copied!

vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="famous_people",    embed_dim=1536,  # openai embedding dimension    m=16,  # HNSW M parameter    ef_construction=128,  # HNSW ef construction parameter    ef=64,  # HNSW ef search parameter)storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_store = LanternVectorStore.from_params(    database=db_name,    host=url.host,    password=url.password,    port=url.port,    user=url.username,    table_name="famous_people",    embed_dim=1536,  # openai embedding dimension    m=16,  # HNSW M parameter    ef_construction=128,  # HNSW ef construction parameter    ef=64,  # HNSW ef search parameter)storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [ ]:

Copied!

index = VectorStoreIndex(nodes, storage_context=storage_context)
index = VectorStoreIndex(nodes, storage_context=storage_context)

定义 `VectorIndexAutoRetriever`¶

我们定义了核心的 VectorIndexAutoRetriever 模块。该模块接收 VectorStoreInfo，其中包含向量存储集合的结构化描述以及其支持的元数据过滤器。然后这些信息将被用于自动检索提示，LLM 将推断元数据过滤器。

In [ ]:

Copied!





from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo


vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(
            name="category",
            type="str",
            description=(
                "Category of the celebrity, one of [Sports, Entertainment,"
                " Business, Music]"
            ),
        ),
        MetadataInfo(
            name="country",
            type="str",
            description=(
                "Country of the celebrity, one of [United States, Barbados,"
                " Portugal]"
            ),
        ),
    ],
)
retriever = VectorIndexAutoRetriever(
    index, vector_store_info=vector_store_info
)
from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo


vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(
            name="category",
            type="str",
            description=(
                "Category of the celebrity, one of [Sports, Entertainment,"
                " Business, Music]"
            ),
        ),
        MetadataInfo(
            name="country",
            type="str",
            description=(
                "Country of the celebrity, one of [United States, Barbados,"
                " Portugal]"
            ),
        ),
    ],
)
retriever = VectorIndexAutoRetriever(
    index, vector_store_info=vector_store_info
)

运行一些示例数据¶

我们尝试运行一些示例数据。请注意元数据过滤器是如何被推断出来的 - 这有助于更精确地检索！

In [ ]:

Copied!

retriever.retrieve("Tell me about two celebrities from United States")
retriever.retrieve("Tell me about two celebrities from United States")

灯笼向量存储（自动检索器）¶

使用 Lantern Vector Store 构建向量索引¶

定义 VectorIndexAutoRetriever¶

运行一些示例数据¶

定义 `VectorIndexAutoRetriever`¶