Rockset向量存储¶

作为实时搜索和分析数据库，Rockset使用索引来提供可扩展和高性能的个性化、产品搜索、语义搜索、聊天机器人应用等功能。由于Rockset专为实时而构建，因此您可以在不断更新的流式数据上构建这些响应式应用程序。通过将Rockset与LlamaIndex集成，您可以轻松地在自己的实时数据上使用LLM进行生产就绪的向量搜索应用程序。

我们将演示如何在LlamaIndex中使用Rockset作为向量存储。

教程¶

在本示例中，我们将使用OpenAI的text-embedding-ada-002模型生成嵌入，并使用Rockset作为向量存储来存储嵌入。我们将从文件中摄取文本并询问有关内容的问题。

设置您的环境¶

从Rockset控制台使用Write API创建一个集合作为您的数据源。将您的集合命名为llamaindex_demo。配置以下摄取转换，使用VECTOR_ENFORCE来定义您的嵌入字段，并利用性能和存储优化：

SELECT 
    _input.* EXCEPT(_meta), 
    VECTOR_ENFORCE(
        _input.embedding,
        1536,
        'float'
    ) as embedding
FROM _input

从Rockset控制台创建一个API密钥，并设置ROCKSET_API_KEY环境变量。在此处找到您的API服务器，并设置ROCKSET_API_SERVER环境变量。设置OPENAI_API_KEY环境变量。
安装依赖项。

pip3 install llama_index rockset

LlamaIndex允许您从各种来源摄取数据。在本示例中，我们将从名为constitution.txt的文本文件中读取数据，该文件是美国宪法的抄本，可以在此处找到。

数据摄取¶

使用LlamaIndex的SimpleDirectoryReader类将文本文件转换为Document对象列表。

In [ ]:

Copied!

%pip install llama-index-llms-openai
%pip install llama-index-vector-stores-rocksetdb
%pip install llama-index-llms-openai
%pip install llama-index-vector-stores-rocksetdb

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader(
    input_files=["{path to}/consitution.txt"]
).load_data()
from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader(
    input_files=["{path to}/consitution.txt"]
).load_data()

实例化LLM和服务上下文。

In [ ]:

Copied!

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(temperature=0.8, model="gpt-3.5-turbo")
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(temperature=0.8, model="gpt-3.5-turbo")

实例化向量存储和存储上下文。

In [ ]:

Copied!

from llama_index.core import StorageContext
from llama_index.vector_stores.rocksetdb import RocksetVectorStore

vector_store = RocksetVectorStore(collection="llamaindex_demo")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
from llama_index.core import StorageContext
from llama_index.vector_stores.rocksetdb import RocksetVectorStore

vector_store = RocksetVectorStore(collection="llamaindex_demo")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

向 llamaindex_demo 集合添加文档并创建索引。

In [ ]:

Copied!





from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    docs,
    storage_context=storage_context,
)
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    docs,
    storage_context=storage_context,
)

查询¶

提出一个关于您的文档的问题并生成一个回答。

In [ ]:

Copied!

response = index.as_query_engine().query("What is the duty of the president?")

print(str(response))
response = index.as_query_engine().query("What is the duty of the president?")

print(str(response))

运行程序。

$ python3 main.py
总统的职责是忠实执行美利坚合众国总统职务，维护、保护和捍卫美利坚合众国宪法，担任陆军和海军总司令，对美利坚合众国的罪行给予暂缓和赦免（但在弹劾案件中除外），签订条约并任命大使和其他公使，确保法律得到忠实执行，并委任美利坚合众国的所有官员。

元数据过滤¶

元数据过滤允许您检索与特定过滤器匹配的相关文档。

将节点添加到您的向量存储中并创建索引。

In [ ]:

Copied!





from llama_index.vector_stores.rocksetdb import RocksetVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.vector_stores.types import NodeWithEmbedding
from llama_index.core.schema import TextNode

nodes = [
    NodeWithEmbedding(
        node=TextNode(
            text="Apples are blue",
            metadata={"type": "fruit"},
        ),
        embedding=[],
    )
]
index = VectorStoreIndex(
    nodes,
    storage_context=StorageContext.from_defaults(
        vector_store=RocksetVectorStore(collection="llamaindex_demo")
    ),
)
from llama_index.vector_stores.rocksetdb import RocksetVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.vector_stores.types import NodeWithEmbedding
from llama_index.core.schema import TextNode

nodes = [
    NodeWithEmbedding(
        node=TextNode(
            text="Apples are blue",
            metadata={"type": "fruit"},
        ),
        embedding=[],
    )
]
index = VectorStoreIndex(
    nodes,
    storage_context=StorageContext.from_defaults(
        vector_store=RocksetVectorStore(collection="llamaindex_demo")
    ),
)

定义元数据过滤器。

In [ ]:

Copied!

from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

filters = MetadataFilters(
    filters=[ExactMatchFilter(key="type", value="fruit")]
)
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

filters = MetadataFilters(
    filters=[ExactMatchFilter(key="type", value="fruit")]
)

检索满足筛选条件的相关文档。

In [ ]:

Copied!

retriever = index.as_retriever(filters=filters)
retriever.retrieve("What colors are apples?")
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What colors are apples?")

从现有集合创建索引¶

您可以使用现有集合中的数据创建索引。

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.rocksetdb import RocksetVectorStore

vector_store = RocksetVectorStore(collection="llamaindex_demo")

index = VectorStoreIndex.from_vector_store(vector_store)
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.rocksetdb import RocksetVectorStore

vector_store = RocksetVectorStore(collection="llamaindex_demo")

index = VectorStoreIndex.from_vector_store(vector_store)

从新集合创建索引¶

您也可以创建一个新的Rockset集合，用作向量存储。

In [ ]:

Copied!

# 从llama_index.vector_stores.rocksetdb导入RocksetVectorStorevector_store = RocksetVectorStore.with_new_collection(    collection="llamaindex_demo",  # 新集合的名称    dimensions=1536,  # 指定向量长度在数据导入转换中（可选）    # 其他RocksetVectorStore参数)index = VectorStoreIndex(    nodes,    storage_context=StorageContext.from_defaults(vector_store=vector_store),)
# 从llama_index.vector_stores.rocksetdb导入RocksetVectorStorevector_store = RocksetVectorStore.with_new_collection(    collection="llamaindex_demo",  # 新集合的名称    dimensions=1536,  # 指定向量长度在数据导入转换中（可选）    # 其他RocksetVectorStore参数)index = VectorStoreIndex(    nodes,    storage_context=StorageContext.from_defaults(vector_store=vector_store),)

配置¶

collection：要查询的集合的名称（必填）。

RocksetVectorStore(collection="my_collection")

workspace：包含集合的工作空间的名称。默认为"commons"。

RocksetVectorStore(worksapce="my_workspace")

api_key：用于认证Rockset请求的API密钥。如果传入了client，则忽略此参数。默认为ROCKSET_API_KEY环境变量。

RocksetVectorStore(api_key="<my key>")

api_server：用于Rockset请求的API服务器。如果传入了client，则忽略此参数。默认为ROCKSET_API_KEY环境变量，如果未设置ROCKSET_API_SERVER，则默认为"https://api.use1a1.rockset.com"。

from rockset import Regions
RocksetVectorStore(api_server=Regions.euc1a1)

client：用于执行Rockset请求的Rockset客户端对象。如果未指定，将使用带有api_key参数（或ROCKSET_API_SERVER环境变量）和api_server参数（或ROCKSET_API_SERVER环境变量）内部构建的客户端对象。

from rockset import RocksetClient
RocksetVectorStore(client=RocksetClient(api_key="<my key>"))

embedding_col：包含嵌入的数据库字段的名称。默认为"embedding"。

RocksetVectorStore(embedding_col="my_embedding")

metadata_col：包含节点数据的数据库字段的名称。默认为"metadata"。

RocksetVectorStore(metadata_col="node")

distance_func：用于衡量向量关系的度量标准。默认为余弦相似度。

RocksetVectorStore(distance_func=RocksetVectorStore.DistanceFunc.DOT_PRODUCT)