Amazon MemoryDB
Vector Search 介绍和langchain集成指南。
什么是Amazon MemoryDB?
MemoryDB 与 Redis OSS 兼容,Redis OSS 是一种流行的开源数据存储,使您能够使用他们今天已经使用的相同灵活且友好的 Redis OSS 数据结构、API 和命令快速构建应用程序。使用 MemoryDB,您的所有数据都存储在内存中,这使您能够实现微秒级的读取和个位数毫秒的写入延迟以及高吞吐量。MemoryDB 还使用多可用区事务日志在多个可用区(AZ)中持久存储数据,以实现快速故障转移、数据库恢复和节点重启。
MemoryDB的向量搜索
MemoryDB的向量搜索功能扩展了MemoryDB的功能。向量搜索可以与现有的MemoryDB功能结合使用。不使用向量搜索的应用程序不会受到其存在的影响。向量搜索在所有提供MemoryDB的区域都可用。您可以使用现有的MemoryDB数据或Redis OSS API来构建机器学习和生成式AI用例,例如检索增强生成、异常检测、文档检索和实时推荐。
- 在Redis哈希和
JSON
中对多个字段进行索引 - 向量相似性搜索(使用
HNSW
(近似最近邻)或FLAT
(K最近邻)) - 向量范围搜索(例如,查找查询向量半径内的所有向量)
- 增量索引而不会损失性能
设置
安装 Redis Python 客户端
Redis-py
是一个可以用于连接 MemoryDB 的 Python 客户端
%pip install --upgrade --quiet redis langchain-aws
from langchain_aws.embeddings import BedrockEmbeddings
embeddings = BedrockEmbeddings()
MemoryDB 连接
有效的 Redis URL 模式有:
redis://
- 连接到未加密的Redis集群rediss://
- 连接到Redis集群,使用TLS加密
有关其他连接参数的更多信息可以在redis-py 文档中找到。
示例数据
首先,我们将描述一些示例数据,以便展示Redis向量存储的各种属性。
metadata = [
{
"user": "john",
"age": 18,
"job": "engineer",
"credit_score": "high",
},
{
"user": "derrick",
"age": 45,
"job": "doctor",
"credit_score": "low",
},
{
"user": "nancy",
"age": 94,
"job": "doctor",
"credit_score": "high",
},
{
"user": "tyler",
"age": 100,
"job": "engineer",
"credit_score": "high",
},
{
"user": "joe",
"age": 35,
"job": "dentist",
"credit_score": "medium",
},
]
texts = ["foo", "foo", "foo", "bar", "bar"]
index_name = "users"
创建 MemoryDB 向量存储
InMemoryVectorStore 实例可以使用以下方法进行初始化
InMemoryVectorStore.__init__
- 直接初始化InMemoryVectorStore.from_documents
- 从Langchain.docstore.Document
对象列表初始化InMemoryVectorStore.from_texts
- 从文本列表初始化(可选带有元数据)InMemoryVectorStore.from_existing_index
- 从现有的MemoryDB索引初始化
from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore
vds = InMemoryVectorStore.from_texts(
embeddings,
redis_url="rediss://cluster_endpoint:6379/ssl=True ssl_cert_reqs=none",
)
vds.index_name
'users'
查询
根据您的使用情况,有几种方法可以查询InMemoryVectorStore
实现:
similarity_search
: 找到与给定向量最相似的向量。similarity_search_with_score
: 找到与给定向量最相似的向量并返回向量距离similarity_search_limit_score
: 找到与给定向量最相似的向量,并将结果数量限制在score_threshold
之内similarity_search_with_relevance_scores
: 找到与给定向量最相似的向量并返回向量相似度max_marginal_relevance_search
: 找到与给定向量最相似的向量,同时优化多样性
results = vds.similarity_search("foo")
print(results[0].page_content)
foo
# with scores (distances)
results = vds.similarity_search_with_score("foo", k=5)
for result in results:
print(f"Content: {result[0].page_content} --- Score: {result[1]}")
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: bar --- Score: 0.1566
Content: bar --- Score: 0.1566
# limit the vector distance that can be returned
results = vds.similarity_search_with_score("foo", k=5, distance_threshold=0.1)
for result in results:
print(f"Content: {result[0].page_content} --- Score: {result[1]}")
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
# with scores
results = vds.similarity_search_with_relevance_scores("foo", k=5)
for result in results:
print(f"Content: {result[0].page_content} --- Similiarity: {result[1]}")
Content: foo --- Similiarity: 1.0
Content: foo --- Similiarity: 1.0
Content: foo --- Similiarity: 1.0
Content: bar --- Similiarity: 0.8434
Content: bar --- Similiarity: 0.8434
# you can also add new documents as follows
new_document = ["baz"]
new_metadata = [{"user": "sam", "age": 50, "job": "janitor", "credit_score": "high"}]
# both the document and metadata must be lists
vds.add_texts(new_document, new_metadata)
['doc:users:b9c71d62a0a34241a37950b448dafd38']
MemoryDB 作为检索器
这里我们讨论了使用向量存储作为检索器的不同选项。
我们可以使用三种不同的搜索方法来进行检索。默认情况下,它将使用语义相似性。
query = "foo"
results = vds.similarity_search_with_score(query, k=3, return_metadata=True)
for result in results:
print("Content:", result[0].page_content, " --- Score: ", result[1])
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
retriever = vds.as_retriever(search_type="similarity", search_kwargs={"k": 4})
docs = retriever.invoke(query)
docs
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),
Document(page_content='bar', metadata={'id': 'doc:users_modified:01ef6caac12b42c28ad870aefe574253', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'})]
还有similarity_distance_threshold
检索器,允许用户指定向量距离
retriever = vds.as_retriever(
search_type="similarity_distance_threshold",
search_kwargs={"k": 4, "distance_threshold": 0.1},
)
docs = retriever.invoke(query)
docs
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]
最后,similarity_score_threshold
允许用户定义相似文档的最低分数
retriever = vds.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.9, "k": 10},
)
retriever.invoke("foo")
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]
retriever.invoke("foo")
[Document(page_content='foo', metadata={'id': 'doc:users:8f6b673b390647809d510112cde01a27', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='bar', metadata={'id': 'doc:users:93521560735d42328b48c9c6f6418d6a', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'}),
Document(page_content='foo', metadata={'id': 'doc:users:125ecd39d07845eabf1a699d44134a5b', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),
Document(page_content='foo', metadata={'id': 'doc:users:d6200ab3764c466082fde3eaab972a2a', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'})]
删除索引
要删除您的条目,您必须通过它们的键来定位它们。
# delete the indices too
InMemoryVectorStore.drop_index(
index_name="users", delete_documents=True, redis_url="redis://localhost:6379"
)
InMemoryVectorStore.drop_index(
index_name="users_modified",
delete_documents=True,
redis_url="redis://localhost:6379",
)
True