Firestore向量存储¶

Firestore向量存储是一种用于存储和查询向量数据的技术。它可以用于许多应用程序，如相似性搜索、推荐系统和自然语言处理等。Firestore向量存储使用向量索引和相似性搜索算法来快速查找最相似的向量。

Firestore向量存储的优势¶

快速查询：Firestore向量存储使用高效的向量索引和相似性搜索算法，可以快速查询最相似的向量。
灵活性：它可以存储各种类型的向量数据，包括稀疏向量和稠密向量。
扩展性：Firestore向量存储可以轻松扩展以处理大规模的向量数据集。
集成性：它与其他Firestore功能和工具集成良好，可以方便地与现有的Firestore数据库集成。

如何使用Firestore向量存储¶

要使用Firestore向量存储，您需要创建一个向量集合，并将向量数据存储在其中。然后，您可以使用相似性搜索算法来查询最相似的向量。Firestore向量存储提供了简单而强大的API，使您可以轻松地存储、查询和分析向量数据。

结论¶

Firestore向量存储为存储和查询向量数据提供了一种强大而灵活的解决方案。它可以帮助开发人员构建各种基于向量数据的应用程序，并提供快速、灵活和可扩展的查询功能。

Google Firestore（原生模式）¶

Firestore是一个无服务器的面向文档的数据库，可以根据需求进行扩展。通过Firestore的Langchain集成，可以扩展数据库应用程序以构建利用人工智能的体验。

本笔记本介绍了如何使用Firestore来存储向量，并使用FirestoreVectorStore类进行查询。

开始之前¶

要运行这个笔记本，你需要完成以下步骤：

在确认在这个笔记本的运行环境中有数据库访问权限之后，填写以下数值并在运行示例脚本之前运行该单元。

库安装¶

如果您在colab上打开此笔记本，您可能需要安装LlamaIndex 🦙。对于这个笔记本，我们还将安装langchain-google-genai来使用Google生成AI嵌入。

In [ ]:

Copied!

%pip install --quiet llama-index
%pip install --quiet llama-index-vector-stores-firestore llama-index-embeddings-huggingface
%pip install --quiet llama-index
%pip install --quiet llama-index-vector-stores-firestore llama-index-embeddings-huggingface

☁ 设置您的Google Cloud项目¶

设置您的Google Cloud项目，以便您可以在此笔记本中利用Google Cloud资源。

如果您不知道您的项目ID，请尝试以下操作：

运行 gcloud config list。
运行 gcloud projects list。
参阅支持页面：查找项目ID。

In [ ]:

Copied!

# @markdown 请在下面的值中填写您的Google Cloud项目ID，然后运行该单元格。PROJECT_ID = "YOUR_PROJECT_ID"  # @param {type:"string"}# 设置项目ID!gcloud config set project {PROJECT_ID}
# @markdown 请在下面的值中填写您的Google Cloud项目ID，然后运行该单元格。PROJECT_ID = "YOUR_PROJECT_ID"  # @param {type:"string"}# 设置项目ID!gcloud config set project {PROJECT_ID}

🔐 认证¶

作为 IAM 用户登录到这个笔记本中，以便访问您的 Google Cloud 项目。

如果您正在使用 Colab 运行这个笔记本，请使用下面的单元格并继续。
如果您正在使用 Vertex AI Workbench，请查看设置说明这里。

In [ ]:

Copied!

from google.colab import auth

auth.authenticate_user()
from google.colab import auth

auth.authenticate_user()

基本用法¶

初始化FirestoreVectorStore¶

FirestoreVectorStore 允许您将数据加载到Firestore中并进行查询。

In [ ]:

Copied!

# @markdown 请指定一个用于演示目的的数据源。COLLECTION_NAME = "test_collection"
# @markdown 请指定一个用于演示目的的数据源。COLLECTION_NAME = "test_collection"

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader# 加载文档并构建索引documents = SimpleDirectoryReader(    "../../examples/data/paul_graham").load_data()
from llama_index.core import SimpleDirectoryReader# 加载文档并构建索引documents = SimpleDirectoryReader(    "../../examples/data/paul_graham").load_data()

In [ ]:

Copied!

from llama_index.embeddings.huggingface import HuggingFaceEmbeddingfrom llama_index.core import Settings# 设置嵌入模型，这是一个本地模型embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
from llama_index.embeddings.huggingface import HuggingFaceEmbeddingfrom llama_index.core import Settings# 设置嵌入模型，这是一个本地模型embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [ ]:

Copied!

from llama_index.core import VectorStoreIndexfrom llama_index.core import StorageContext，ServiceContextfrom llama_index.vector_stores.firestore import FirestoreVectorStore# 创建一个Firestore向量存储store = FirestoreVectorStore(collection_name=COLLECTION_NAME)storage_context = StorageContext.from_defaults(vector_store=store)service_context = ServiceContext.from_defaults(    llm=None, embed_model=embed_model)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context, service_context=service_context)
from llama_index.core import VectorStoreIndexfrom llama_index.core import StorageContext，ServiceContextfrom llama_index.vector_stores.firestore import FirestoreVectorStore# 创建一个Firestore向量存储store = FirestoreVectorStore(collection_name=COLLECTION_NAME)storage_context = StorageContext.from_defaults(vector_store=store)service_context = ServiceContext.from_defaults(    llm=None, embed_model=embed_model)index = VectorStoreIndex.from_documents(    documents, storage_context=storage_context, service_context=service_context)

/var/folders/mh/cqn7wzgs3j79rbg243_gfcx80000gn/T/ipykernel_29666/1668628626.py:10: DeprecationWarning: Call to deprecated class method from_defaults. (ServiceContext is deprecated, please use `llama_index.settings.Settings` instead.) -- Deprecated since version 0.10.0.
  service_context = ServiceContext.from_defaults(llm=None, embed_model=embed_model)

LLM is explicitly disabled. Using MockLLM.

执行搜索¶

您可以使用FirestoreVectorStore来对您存储的向量执行相似性搜索。这对于查找相似的文档或文本非常有用。

In [ ]:

Copied!

query_engine = index.as_query_engine()
res = query_engine.query("What did the author do growing up?")
print(str(res.source_nodes[0].text))
query_engine = index.as_query_engine()
res = query_engine.query("What did the author do growing up?")
print(str(res.source_nodes[0].text))

None
What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.

I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.

With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]

The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.

Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.

Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.

I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.

AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven't tried rereading The Moon is a Harsh Mistress, so I don't know how well it has aged, but when I read it I was drawn entirely into its world.

您可以通过指定 filters 参数来对搜索结果应用预过滤。

In [ ]:

Copied!





from llama_index.core.vector_stores.types import (
    MetadataFilters,
    ExactMatchFilter,
    MetadataFilter,
)

filters = MetadataFilters(
    filters=[MetadataFilter(key="author", value="Paul Graham")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What did the author do growing up?")
print(str(res.source_nodes[0].text))
from llama_index.core.vector_stores.types import (
    MetadataFilters,
    ExactMatchFilter,
    MetadataFilter,
)

filters = MetadataFilters(
    filters=[MetadataFilter(key="author", value="Paul Graham")]
)
query_engine = index.as_query_engine(filters=filters)
res = query_engine.query("What did the author do growing up?")
print(str(res.source_nodes[0].text))