Skip to main content
Open In ColabOpen on GitHub

Supabase (Postgres)

Supabase 是一个开源的 Firebase 替代方案。 Supabase 构建在 PostgreSQL 之上,提供了强大的 SQL 查询能力,并能够与现有的工具和框架进行简单的接口对接。

PostgreSQL 也被称为 Postgres, 是一个免费且开源的关系型数据库管理系统(RDBMS), 强调可扩展性和 SQL 的兼容性。

Supabase 提供了一个开源工具包,用于使用 Postgres 和 pgvector 开发 AI 应用程序。使用 Supabase 客户端库来大规模存储、索引和查询您的向量嵌入。

在笔记本中,我们将演示围绕Supabase向量存储的SelfQueryRetriever

具体来说,我们将:

  1. 创建一个Supabase数据库
  2. 启用 pgvector 扩展
  3. 创建一个documents表和match_documents函数,这些将被SupabaseVectorStore使用
  4. 将示例文档加载到向量存储(数据库表)中
  5. 构建并测试一个自查询检索器

设置Supabase数据库

  1. 前往 https://database.new 以配置您的 Supabase 数据库。
  2. In the studio, jump to the SQL editor and run the following script to enable pgvector and setup your database as a vector store:
    -- Enable the pgvector extension to work with embedding vectors
    create extension if not exists vector;

    -- Create a table to store your documents
    create table
    documents (
    id uuid primary key,
    content text, -- corresponds to Document.pageContent
    metadata jsonb, -- corresponds to Document.metadata
    embedding vector (1536) -- 1536 works for OpenAI embeddings, change if needed
    );

    -- Create a function to search for documents
    create function match_documents (
    query_embedding vector (1536),
    filter jsonb default '{}'
    ) returns table (
    id uuid,
    content text,
    metadata jsonb,
    similarity float
    ) language plpgsql as $$
    #variable_conflict use_column
    begin
    return query
    select
    id,
    content,
    metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
    from documents
    where metadata @> filter
    order by documents.embedding <=> query_embedding;
    end;
    $$;

创建一个Supabase向量存储

接下来,我们将创建一个Supabase向量存储并用一些数据填充它。我们已经创建了一个包含电影摘要的小型演示文档集。

请确保安装支持openai的最新版本的langchain

%pip install --upgrade --quiet  langchain langchain-openai tiktoken

自查询检索器要求您安装lark

%pip install --upgrade --quiet  lark

我们还需要supabase包:

%pip install --upgrade --quiet  supabase

由于我们正在使用SupabaseVectorStoreOpenAIEmbeddings,我们必须加载它们的API密钥。

  • 要找到您的 SUPABASE_URLSUPABASE_SERVICE_KEY,请前往您的 Supabase 项目的 API 设置

    • SUPABASE_URL 对应项目 URL
    • SUPABASE_SERVICE_KEY 对应 service_role API 密钥
  • 要获取您的OPENAI_API_KEY,请导航到您的OpenAI账户上的API keys并创建一个新的密钥。

import getpass
import os

if "SUPABASE_URL" not in os.environ:
os.environ["SUPABASE_URL"] = getpass.getpass("Supabase URL:")
if "SUPABASE_SERVICE_KEY" not in os.environ:
os.environ["SUPABASE_SERVICE_KEY"] = getpass.getpass("Supabase Service Key:")
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

可选: 如果您将Supabase和OpenAI API密钥存储在.env文件中,您可以使用dotenv加载它们。

%pip install --upgrade --quiet  python-dotenv
from dotenv import load_dotenv

load_dotenv()

首先,我们将创建一个 Supabase 客户端并实例化一个 OpenAI 嵌入类。

import os

from langchain_community.vectorstores import SupabaseVectorStore
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from supabase.client import Client, create_client

supabase_url = os.environ.get("SUPABASE_URL")
supabase_key = os.environ.get("SUPABASE_SERVICE_KEY")
supabase: Client = create_client(supabase_url, supabase_key)

embeddings = OpenAIEmbeddings()

接下来让我们创建我们的文档。

docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
},
),
]

vectorstore = SupabaseVectorStore.from_documents(
docs,
embeddings,
client=supabase,
table_name="documents",
query_name="match_documents",
)

创建我们的自查询检索器

现在我们可以实例化我们的检索器。为此,我们需要提前提供一些关于我们的文档支持的元数据字段的信息以及文档内容的简短描述。

from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import OpenAI

metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie",
type="string or list[string]",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

测试一下

现在我们可以尝试实际使用我们的检索器了!

# This example only specifies a relevant query
retriever.invoke("What are some movies about dinosaurs")
query='dinosaur' filter=None limit=None
[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7}),
Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),
Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'}),
Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'})]
# This example only specifies a filter
retriever.invoke("I want to watch a movie rated higher than 8.5")
query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5) limit=None
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'}),
Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'})]
# This example specifies a query and a filter
retriever.invoke("Has Greta Gerwig directed any movies about women?")
query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None
[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'rating': 8.3, 'director': 'Greta Gerwig'})]
# This example specifies a composite filter
retriever.invoke("What's a highly rated (above 8.5) science fiction film?")
query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GTE: 'gte'>, attribute='rating', value=8.5), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction')]) limit=None
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'})]
# This example specifies a query and composite filter
retriever.invoke(
"What's a movie after 1990 but before (or on) 2005 that's all about toys, and preferably is animated"
)
query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LTE: 'lte'>, attribute='year', value=2005), Comparison(comparator=<Comparator.LIKE: 'like'>, attribute='genre', value='animated')]) limit=None
[Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]

筛选 k

我们也可以使用自我查询检索器来指定k:要获取的文档数量。

我们可以通过将enable_limit=True传递给构造函数来实现这一点。

retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
enable_limit=True,
verbose=True,
)
# This example only specifies a relevant query
retriever.invoke("what are two movies about dinosaurs")
query='dinosaur' filter=None limit=2
[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7}),
Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]

这个页面有帮助吗?