`langchain_core.indexing.api`.aindex¶

async langchain_core.indexing.api.aindex(docs_source: Union[BaseLoader, Iterable[Document], AsyncIterator[Document]], record_manager: RecordManager, vector_store: VectorStore, *, batch_size: int = 100, cleanup: Literal['incremental', 'full', None] = None, source_id_key: Optional[Union[str, Callable[[Document], str]]] = None, cleanup_batch_size: int = 1000, force_update: bool = False) → IndexingResult[source]¶

将加载器中的数据索引到向量存储中。

索引功能使用管理器来跟踪向量存储中的文档。

这使我们能够跟踪哪些文档已更新，哪些文档已删除，哪些文档应该被跳过。

目前，文档是使用它们的哈希进行索引的，用户无法指定文档的uid。

重要提示：

如果auto_cleanup设置为True，则加载器应返回整个数据集，而不仅仅是数据集的子集。否则，auto_cleanup将删除不应删除的文档。

参数：

docs_source: 要索引的数据加载器或文档的可迭代对象。 record_manager: 时间戳集，用于跟踪哪些文档已更新。 vector_store: 用于将文档索引到的向量存储。 batch_size: 索引时要使用的批量大小。 cleanup: 如何处理文档的清理。

Incremental: 清理所有未更新的文档和与索引期间看到的源ID相关联的文档。
清理在索引过程中持续进行，有助于最小化用户看到重复内容的可能性。

Full: 删除加载器未返回的所有文档。
清理在所有文档被索引后运行。这意味着用户在索引过程中可能会看到重复内容。

None: 不删除任何文档。

source_id_key: 有助于识别文档原始来源的可选键。 cleanup_batch_size: 清理文档时要使用的批量大小。 force_update: 即使文档已在记录管理器中存在，也强制更新文档。如果您正在重新索引具有更新嵌入的文档，则此选项很有用。

返回：

包含有关添加、更新、删除或跳过多少文档的信息的索引结果。

Parameters

docs_source (Union[BaseLoader, Iterable[Document], AsyncIterator[Document]]) –
record_manager (RecordManager) –
vector_store (VectorStore) –
batch_size (int) –
cleanup (Literal['incremental', 'full', None]) –
source_id_key (Optional[Union[str, Callable[[Document], str]]]) –
cleanup_batch_size (int) –
force_update (bool) –

Return type

IndexingResult

langchain_core.indexing.api.aindex¶

`langchain_core.indexing.api`.aindex¶