agentchat.contrib.vectordb.chromadb

ChromaVectorDB（Chroma向量数据库）

class ChromaVectorDB(VectorDB)

一个使用ChromaDB作为后端的向量数据库。

init

def __init__(*,
             client=None,
             path: str = "tmp/db",
             embedding_function: Callable = None,
             metadata: dict = None,
             **kwargs) -> None

初始化向量数据库。

参数：

client - chromadb.Client | 向量数据库的客户端对象。默认为None。如果提供了客户端对象，将直接使用该对象并忽略其他参数。
path - str | 向量数据库的路径。默认为tmp/db。版本 <=0.2.24 的默认值为None。
embedding_function - Callable | 用于生成文档的向量表示的嵌入函数。默认为None，将使用SentenceTransformerEmbeddingFunction("all-MiniLM-L6-v2")。
metadata - dict | 向量数据库的元数据。默认为None。如果为None，将使用以下设置：
setting - {"hnsw:space": "ip", "hnsw:construction_ef": 30, "hnsw:M": 32}。有关元数据的更多详细信息，请参见distances、hnsw和ALGO_PARAMS。
kwargs - dict | 附加的关键字参数。

返回值：

None

create_collection（创建集合）

def create_collection(collection_name: str,
                      overwrite: bool = False,
                      get_or_create: bool = True) -> Collection

在向量数据库中创建一个集合。情况1：如果集合不存在，则创建集合。情况2：集合存在，如果overwrite为True，则覆盖集合。情况3：集合存在且overwrite为False，如果get_or_create为True，则获取集合，否则引发ValueError。

参数：

collection_name - str | 集合的名称。
overwrite - bool | 如果集合存在，是否覆盖集合。默认为False。
get_or_create - bool | 如果集合存在，是否获取集合。默认为True。

返回值：

Collection | 集合对象。

get_collection（获取集合）

def get_collection(collection_name: str = None) -> Collection

从向量数据库中获取集合。

参数：

collection_name - str | 集合的名称。默认为None。如果为None，则返回当前活动的集合。

返回值：

Collection | 集合对象。

delete_collection（删除集合）

def delete_collection(collection_name: str) -> None

从向量数据库中删除集合。

参数：

collection_name - str | 集合的名称。

返回值：

None

insert_docs

def insert_docs(docs: List[Document],
                collection_name: str = None,
                upsert: bool = False) -> None

将文档插入到向量数据库的集合中。

参数：

docs - List[Document] | 文档列表。每个文档是一个 TypedDict Document。
collection_name - str | 集合的名称。默认为 None。
upsert - bool | 如果文档已存在，是否更新文档。默认为 False。
kwargs - Dict | 附加的关键字参数。

返回值：

None

update_docs

def update_docs(docs: List[Document], collection_name: str = None) -> None

更新向量数据库集合中的文档。

参数：

docs - List[Document] | 文档列表。
collection_name - str | 集合的名称。默认为 None。

返回值：

None

delete_docs

def delete_docs(ids: List[ItemID],
                collection_name: str = None,
                **kwargs) -> None

从向量数据库的集合中删除文档。

参数：

ids - List[ItemID] | 文档 id 的列表。每个 id 是一个类型化的 ItemID。
collection_name - str | 集合的名称。默认为 None。
kwargs - Dict | 附加的关键字参数。

返回值：

None

retrieve_docs

def retrieve_docs(queries: List[str],
                  collection_name: str = None,
                  n_results: int = 10,
                  distance_threshold: float = -1,
                  **kwargs) -> QueryResults

根据查询从向量数据库的集合中检索文档。

参数：

queries - List[str] | 查询的列表。每个查询是一个字符串。
collection_name - str | 集合的名称。默认为 None。
n_results - int | 返回的相关文档的数量。默认为 10。
distance_threshold - float | 距离分数的阈值，只返回小于该阈值的距离。如果 < 0，则不使用该阈值进行过滤。默认为 -1。
kwargs - Dict | 附加的关键字参数。

返回值：

QueryResults | 查询结果。每个查询结果是一个包含文档和距离的元组列表的列表。

get_docs_by_ids

def get_docs_by_ids(ids: List[ItemID] = None,
                    collection_name: str = None,
                    include=None,
                    **kwargs) -> List[Document]

根据文档 id 从向量数据库的集合中检索文档。

参数：

ids - List[ItemID] | 文档 id 的列表。如果为 None，则返回所有文档。默认为 None。
collection_name - str | 集合的名称。默认为 None。
include - List[str] | 要包含的字段。默认为 None。如果为 None，则包含 ["metadatas", "documents"]，ids 将始终包含在内。
kwargs - dict | 附加的关键字参数。

返回值： List[Document] | 结果。

ChromaVectorDB（Chroma向量数据库）​

__init__​

create_collection（创建集合）​

get_collection（获取集合）​

delete_collection（删除集合）​

insert_docs​

update_docs​

delete_docs​

retrieve_docs​

get_docs_by_ids​

ChromaVectorDB（Chroma向量数据库）

init

create_collection（创建集合）

get_collection（获取集合）

delete_collection（删除集合）

insert_docs

update_docs

delete_docs

retrieve_docs

get_docs_by_ids