pgvectordb

Collection

class Collection()

一个用于 PGVector 的 Collection 对象。

属性：

client - PGVector 客户端。
collection_name str - 集合的名称。默认为 "documents"。
embedding_function Callable - 用于生成向量表示的嵌入函数。默认为 None。当为 None 时，将使用 SentenceTransformer("all-MiniLM-L6-v2").encode。可以从以下模型中选择： https://huggingface.co/models?library=sentence-transformers
metadata Optional[dict] - 集合的元数据。
get_or_create Optional - 标志，指示是获取还是创建集合。

init

def __init__(client=None,
             collection_name: str = "autogen-docs",
             embedding_function: Callable = None,
             metadata=None,
             get_or_create=None)

初始化 Collection 对象。

参数：

client - PostgreSQL 客户端。
collection_name - 集合的名称。默认为 "documents"。
embedding_function - 用于生成向量表示的嵌入函数。
metadata - 集合的元数据。
get_or_create - 标志，指示是获取还是创建集合。

None

add

def add(ids: List[ItemID],
        documents: List,
        embeddings: List = None,
        metadatas: List = None) -> None

将文档添加到集合中。

参数：

ids List[ItemID] - 文档 ID 的列表。
embeddings List - 文档嵌入的列表。可选。
metadatas List - 文档元数据的列表。可选。
documents List - 文档的列表。

None

upsert

def upsert(ids: List[ItemID],
           documents: List,
           embeddings: List = None,
           metadatas: List = None) -> None

将文档更新或插入到集合中。

参数：

ids List[ItemID] - 文档 ID 的列表。
documents List - 文档的列表。
embeddings List - 文档嵌入的列表。
metadatas List - 文档元数据的列表。

None

count

def count() -> int

获取集合中文档的总数。

int - 文档的总数。

table_exists

def table_exists(table_name: str) -> bool

检查 PostgreSQL 数据库中是否存在表。

参数：

table_name str - 要检查的表的名称。

bool - 如果表存在，则为 True；否则为 False。

get

def get(ids: Optional[str] = None,
        include: Optional[str] = None,
        where: Optional[str] = None,
        limit: Optional[Union[int, str]] = None,
        offset: Optional[Union[int, str]] = None) -> List[Document]

从集合中检索文档。

参数：

ids Optional[List] - 文档 ID 的列表。
include Optional - 要包含的字段。
where Optional - 附加的过滤条件。
limit Optional - 要检索的最大文档数量。
offset Optional - 分页的偏移量。

返回值：

List - 检索到的文档。

update

def update(ids: List, embeddings: List, metadatas: List,
           documents: List) -> None

更新集合中的文档。

参数：

ids List - 文档 ID 的列表。
embeddings List - 文档嵌入的列表。
metadatas List - 文档元数据的列表。
documents List - 文档的列表。

返回值：

None

euclidean_distance

@staticmethod
def euclidean_distance(arr1: List[float], arr2: List[float]) -> float

计算两个向量之间的欧氏距离。

参数：

arr1 (List[float]): 第一个向量。
arr2 (List[float]): 第二个向量。

返回值：

float: arr1 和 arr2 之间的欧氏距离。

cosine_distance

@staticmethod
def cosine_distance(arr1: List[float], arr2: List[float]) -> float

计算两个向量之间的余弦距离。

参数：

arr1 (List[float]): 第一个向量。
arr2 (List[float]): 第二个向量。

返回值：

float: arr1 和 arr2 之间的余弦距离。

inner_product_distance

@staticmethod
def inner_product_distance(arr1: List[float], arr2: List[float]) -> float

计算两个向量之间的内积距离。

参数：

arr1 (List[float]): 第一个向量。
arr2 (List[float]): 第二个向量。

返回值：

float: arr1 和 arr2 之间的内积距离。

query

def query(query_texts: List[str],
          collection_name: Optional[str] = None,
          n_results: Optional[int] = 10,
          distance_type: Optional[str] = "euclidean",
          distance_threshold: Optional[float] = -1,
          include_embedding: Optional[bool] = False) -> QueryResults

查询集合中的文档。

参数：

query_texts List[str] - 查询文本的列表。
collection_name Optional[str] - 集合的名称。
n_results int - 要返回的最大结果数量。
distance_type Optional[str] - 距离搜索类型 - 欧氏距离或余弦距离
distance_threshold Optional[float] - 距离阈值以限制搜索
include_embedding Optional[bool] - 在查询结果中包含嵌入值

返回值：

QueryResults - 查询结果。

convert_string_to_array

@staticmethod
def convert_string_to_array(array_string: str) -> List[float]

将数组的字符串表示转换为浮点数列表。

参数：

array_string (str): 数组的字符串表示。

返回值：

list: 从输入字符串解析出的浮点数列表。如果输入为空，则返回空列表。不是字符串时，它会返回输入本身。

默认值为None。当为None时，将使用SentenceTransformer("all-MiniLM-L6-v2").encode。

可以从以下模型中选择： https://huggingface.co/models?library=sentence-transformers

metadata - dict | 向量数据库的元数据。默认值为None。如果为None，则使用此值。
setting - {"hnsw:space": "ip", "hnsw:construction_ef": 30, "hnsw:M": 16}。使用hnsw（嵌入向量_l2_ops）在表上创建索引，WITH（m = hnsw:M）ef_construction = "hnsw:construction_ef"。了解更多信息：https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw

返回值：

None

establish_connection

def establish_connection(
        conn: Optional[psycopg.Connection] = None,
        connection_string: Optional[str] = None,
        host: Optional[str] = None,
        port: Optional[Union[int, str]] = None,
        dbname: Optional[str] = None,
        username: Optional[str] = None,
        password: Optional[str] = None,
        connect_timeout: Optional[int] = 10) -> psycopg.Connection

使用psycopg建立与PostgreSQL数据库的连接。

参数：

conn - 一个现有的psycopg连接对象。如果提供，将使用此连接。
connection_string - 包含连接信息的字符串。如果提供，将使用此字符串建立新连接。
host - PostgreSQL服务器的主机名。如果未提供connection_string，则使用此值。
port - 要连接到的服务器主机的端口号。如果未提供connection_string，则使用此值。
dbname - 数据库名称。如果未提供connection_string，则使用此值。
username - 连接的用户名。如果未提供connection_string，则使用此值。
password - 用户的密码。如果未提供connection_string，则使用此值。
connect_timeout - 连接的最长等待时间，以秒为单位。默认值为10秒。

返回值：

一个表示已建立连接的psycopg.Connection对象。

抛出异常：

如果未提供凭据，则引发PermissionError

psycopg.Error - 尝试连接到数据库时发生错误。

create_collection

def create_collection(collection_name: str,
                      overwrite: bool = False,
                      get_or_create: bool = True) -> Collection

在向量数据库中创建一个集合。情况1. 如果集合不存在，则创建集合。情况2. 如果集合存在且overwrite为True，则覆盖集合。情况3. 如果集合存在且overwrite为False，如果get_or_create为True，则获取集合，否则引发ValueError。

参数：

collection_name - str | 集合的名称。
overwrite - bool | 如果集合存在，是否覆盖集合。默认值为False。
get_or_create - bool | 如果集合存在，是否获取集合。默认值为True。

返回值：

Collection | 集合对象。

get_collection

def get_collection(collection_name: str = None) -> Collection

def get_collection(collection_name: str = None) -> Collection

get_collection 函数用于获取一个集合（collection），可以通过指定集合名称来获取特定的集合。该函数接受一个字符串类型的参数 collection_name，用于指定要获取的集合的名称。如果不指定 collection_name，则会返回默认的集合。函数的返回值是一个 Collection 对象，表示获取到的集合。

请注意，Collection 是一个自定义的数据类型，具体的定义和功能取决于代码的上下文，这里无法提供具体的细节。从向量数据库中获取集合。

参数：

collection_name - str | 集合的名称。默认为None。如果为None，则返回当前活动的集合。

返回值：

Collection | 集合对象。

delete_collection

def delete_collection(collection_name: str) -> None

从向量数据库中删除集合。

参数：

collection_name - str | 集合的名称。

返回值：

None

insert_docs

def insert_docs(docs: List[Document],
                collection_name: str = None,
                upsert: bool = False) -> None

将文档插入到向量数据库的集合中。

参数：

docs - List[Document] | 文档列表。每个文档都是一个TypedDict Document。
collection_name - str | 集合的名称。默认为None。
upsert - bool | 如果文档存在，是否更新文档。默认为False。
kwargs - Dict | 其他关键字参数。

返回值：

None

update_docs

def update_docs(docs: List[Document], collection_name: str = None) -> None

更新向量数据库集合中的文档。

参数：

docs - List[Document] | 文档列表。
collection_name - str | 集合的名称。默认为None。

返回值：

None

delete_docs

def delete_docs(ids: List[ItemID], collection_name: str = None) -> None

从向量数据库的集合中删除文档。

参数：

ids - List[ItemID] | 文档ID列表。每个ID都是一个类型化的ItemID。
collection_name - str | 集合的名称。默认为None。
kwargs - Dict | 其他关键字参数。

返回值：

None

retrieve_docs

def retrieve_docs(queries: List[str],
                  collection_name: str = None,
                  n_results: int = 10,
                  distance_threshold: float = -1) -> QueryResults

根据查询从向量数据库的集合中检索文档。

参数：

queries - List[str] | 查询列表。每个查询都是一个字符串。
collection_name - str | 集合的名称。默认为None。
n_results - int | 返回的相关文档数量。默认为10。
distance_threshold - float | 距离分数的阈值，只返回小于该阈值的距离。如果小于0，则不进行过滤。默认为-1。
kwargs - Dict | 其他关键字参数。

返回值：

QueryResults | 查询结果。每个查询结果是一个包含文档和距离的元组列表的列表。

get_docs_by_ids

def get_docs_by_ids(ids: List[ItemID] = None,
                    collection_name: str = None,
                    include=None,
                    **kwargs) -> List[Document]

根据ID从向量数据库的集合中检索文档。参数：

ids - List[ItemID] | 文档 id 的列表。如果为 None，则返回所有文档。默认为 None。
collection_name - str | 集合的名称。默认为 None。
include - List[str] | 要包含的字段。默认为 None。如果为 None，则包含 ["metadatas", "documents"]，ids 总是会被包含。
kwargs - dict | 附加的关键字参数。

返回值：

List[Document] | 结果。

Collection​

__init__​

add​

upsert​

count​

table_exists​

get​

update​

euclidean_distance​

cosine_distance​

inner_product_distance​

query​

convert_string_to_array​

establish_connection​

create_collection​

get_collection​

delete_collection​

insert_docs​

update_docs​

delete_docs​

retrieve_docs​

get_docs_by_ids​

Collection

init

add

upsert

count

table_exists

get

update

euclidean_distance

cosine_distance

inner_product_distance

query

convert_string_to_array

establish_connection

create_collection

get_collection

delete_collection

insert_docs

update_docs

delete_docs

retrieve_docs

get_docs_by_ids