Upstash向量存储¶

我们将看看如何使用LlamaIndex与Upstash向量进行交互！

In [ ]:

Copied!

! pip install -q llama-index upstash-vector
! pip install -q llama-index upstash-vector

In [ ]:

Copied!





from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.vector_stores import UpstashVectorStore
from llama_index.core import StorageContext
import textwrap
import openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.vector_stores import UpstashVectorStore
from llama_index.core import StorageContext
import textwrap
import openai

In [ ]:

Copied!

# 设置OpenAI APIopenai.api_key = "sk-..."
# 设置OpenAI APIopenai.api_key = "sk-..."

In [ ]:

Copied!

# 下载数据! mkdir -p 'data/paul_graham/'! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
# 下载数据! mkdir -p 'data/paul_graham/'! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-03 20:04:25--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’

data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.01s   

2024-02-03 20:04:25 (5.96 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]

现在，我们可以使用LlamaIndex SimpleDirectoryReader加载文档。

In [ ]:

Copied!

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print("# 文档数量:", len(documents))
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()print("# 文档数量:", len(documents))

# Documents: 1

在Upstash上创建一个索引，访问https://console.upstash.com/vector，创建一个具有1536维度和`Cosine`距离度量的索引。复制下方的URL和令牌。

In [ ]:

Copied!





vector_store = UpstashVectorStore(url="https://...", token="...")

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
vector_store = UpstashVectorStore(url="https://...", token="...")

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

现在我们已经成功创建了一个索引，并用来自文章的向量填充它！数据将需要一点时间进行索引，然后就可以进行查询了。

In [ ]:

Copied!





query_engine = index.as_query_engine()
res1 = query_engine.query("What did the author learn?")
print(textwrap.fill(str(res1), 100))

print("\n")

res2 = query_engine.query("What is the author's opinion on startups?")
print(textwrap.fill(str(res2), 100))
query_engine = index.as_query_engine()
res1 = query_engine.query("What did the author learn?")
print(textwrap.fill(str(res1), 100))

print("\n")

res2 = query_engine.query("What is the author's opinion on startups?")
print(textwrap.fill(str(res2), 100))

The author learned that the study of philosophy in college did not live up to their expectations.
They found that other fields took up most of the space of ideas, leaving little room for what they
perceived as the ultimate truths that philosophy was supposed to explore. As a result, they decided
to switch to studying AI.


The author's opinion on startups is that they are in need of help and support, especially in the
beginning stages. The author believes that founders of startups are often helpless and face various
challenges, such as getting incorporated and understanding the intricacies of running a company. The
author's investment firm, Y Combinator, aims to provide seed funding and comprehensive support to
startups, offering them the guidance and resources they need to succeed.

元数据过滤¶

您可以在 VectorStoreQuery 中传递 MetadataFilters 来过滤从 Upstash 向量存储返回的节点。

In [ ]:

Copied!





import os

from llama_index.vector_stores.upstash import UpstashVectorStore
from llama_index.core.vector_stores.types import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

vector_store = UpstashVectorStore(
    url=os.environ.get("UPSTASH_VECTOR_URL") or "",
    token=os.environ.get("UPSTASH_VECTOR_TOKEN") or "",
)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="author", value="Marie Curie", operator=FilterOperator.EQ
        )
    ],
)

retriever = index.as_retriever(filters=filters)

retriever.retrieve("What is inception about?")
import os

from llama_index.vector_stores.upstash import UpstashVectorStore
from llama_index.core.vector_stores.types import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

vector_store = UpstashVectorStore(
    url=os.environ.get("UPSTASH_VECTOR_URL") or "",
    token=os.environ.get("UPSTASH_VECTOR_TOKEN") or "",
)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="author", value="Marie Curie", operator=FilterOperator.EQ
        )
    ],
)

retriever = index.as_retriever(filters=filters)

retriever.retrieve("What is inception about?")

我们还可以使用AND或OR条件结合多个MetadataFilters

In [ ]:

Copied!





from llama_index.core.vector_stores import FilterOperator, FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="theme",
            value=["Fiction", "Horror"],
            operator=FilterOperator.IN,
        ),
        MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
    ],
    condition=FilterCondition.AND,
)

retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?")
from llama_index.core.vector_stores import FilterOperator, FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="theme",
            value=["Fiction", "Horror"],
            operator=FilterOperator.IN,
        ),
        MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
    ],
    condition=FilterCondition.AND,
)

retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?")