Skip to main content
Open In ColabOpen on GitHub

LangSmithLoader

本笔记本提供了快速入门LangSmith 文档加载器的概述。有关LangSmithLoader所有功能和配置的详细文档,请前往API参考

概述

集成详情

本地可序列化JS支持
LangSmithLoaderlangchain-core

加载器特性

来源懒加载原生异步
LangSmithLoader

设置

要访问LangSmith文档加载器,您需要安装langchain-core,创建一个LangSmith账户并获取一个API密钥。

凭证

https://langsmith.com注册并生成一个API密钥。完成此操作后,设置LANGSMITH_API_KEY环境变量:

import getpass
import os

if not os.environ.get("LANGSMITH_API_KEY"):
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

如果你想获得自动化的最佳追踪,你也可以开启LangSmith追踪:

# os.environ["LANGSMITH_TRACING"] = "true"

安装

安装 langchain-core:

%pip install -qU langchain-core

克隆示例数据集

对于这个例子,我们将克隆并加载一个公共的LangSmith数据集。克隆会在我们的个人LangSmith账户上创建该数据集的副本。您只能加载您拥有个人副本的数据集。

from langsmith import Client as LangSmithClient

ls_client = LangSmithClient()

dataset_name = "LangSmith Few Shot Datasets Notebook"
dataset_public_url = (
"https://smith.langchain.com/public/55658626-124a-4223-af45-07fb774a6212/d"
)

ls_client.clone_public_dataset(dataset_public_url)

初始化

现在我们可以实例化我们的文档加载器并加载文档:

from langchain_core.document_loaders import LangSmithLoader

loader = LangSmithLoader(
dataset_name=dataset_name,
content_key="question",
limit=50,
# format_content=...,
# ...
)
API Reference:LangSmithLoader

加载

docs = loader.load()
print(docs[0].page_content)
Show me an example using Weaviate, but customizing the vectorStoreRetriever to return the top 10 k nearest neighbors.
print(docs[0].metadata["inputs"])
{'question': 'Show me an example using Weaviate, but customizing the vectorStoreRetriever to return the top 10 k nearest neighbors. '}
print(docs[0].metadata["outputs"])
{'answer': 'To customize the Weaviate client and return the top 10 k nearest neighbors, you can utilize the `as_retriever` method with the appropriate parameters. Here\'s how you can achieve this:\n\n\`\`\`python\n# Assuming you have imported the necessary modules and classes\n\n# Create the Weaviate client\nclient = weaviate.Client(url=os.environ["WEAVIATE_URL"], ...)\n\n# Initialize the Weaviate wrapper\nweaviate = Weaviate(client, index_name, text_key)\n\n# Customize the client to return top 10 k nearest neighbors using as_retriever\ncustom_retriever = weaviate.as_retriever(\n    search_type="similarity",\n    search_kwargs={\n        \'k\': 10  # Customize the value of k as needed\n    }\n)\n\n# Now you can use the custom_retriever to perform searches\nresults = custom_retriever.search(query, ...)\n\`\`\`'}
list(docs[0].metadata.keys())
['dataset_id',
'inputs',
'outputs',
'metadata',
'id',
'created_at',
'modified_at',
'runs',
'source_run_id']

懒加载

page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)
# page = []
break
len(page)
10

API 参考

有关所有LangSmithLoader功能和配置的详细文档,请访问API参考:https://python.langchain.com/api_reference/core/document_loaders/langchain_core.document_loaders.langsmith.LangSmithLoader.html


这个页面有帮助吗?