ArxivLoader#
- class langchain_community.document_loaders.arxiv.ArxivLoader(query: str, doc_content_chars_max: int | None = None, **kwargs: Any)[源代码]#
从Arxiv加载查询结果。 加载器将原始的PDF格式转换为文本。
- Setup:
安装
arxiv
和PyMuPDF
包。PyMuPDF
将从 arxiv.org 网站下载的 PDF 文件转换为文本格式。pip install -U arxiv pymupdf
- Instantiate:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning", # load_max_docs=2, # load_all_available_meta=False )
- Load:
docs = loader.load() print(docs[0].page_content[:100]) print(docs[0].metadata)
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Async load:
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Use summaries of articles as docs:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning" ) docs = loader.get_summaries_as_docs() print(docs[0].page_content[:100]) print(docs[0].metadata)
Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning { 'Entry ID': 'http://arxiv.org/abs/2402.03268v2', 'Published': datetime.date(2024, 2, 29), 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang' }
使用搜索查询初始化以在Arxiv中查找文档。 支持ArxivAPIWrapper的所有参数。
- Parameters:
query (str) – 用于在Arxiv中查找文档的自由文本
doc_content_chars_max (int | None) – 文档内容长度的切割限制
kwargs (Any)
方法
__init__
(query[, doc_content_chars_max])使用搜索查询初始化以在Arxiv中查找文档。
一个用于文档的懒加载器。
aload
()将数据加载到Document对象中。
使用论文摘要作为文档,而不是源Arvix论文
懒加载Arvix文档
load
()将数据加载到Document对象中。
load_and_split
([text_splitter])加载文档并将其分割成块。
- __init__(query: str, doc_content_chars_max: int | None = None, **kwargs: Any)[source]#
使用搜索查询初始化以在Arxiv中查找文档。 支持ArxivAPIWrapper的所有参数。
- Parameters:
query (str) – 用于在Arxiv中查找文档的自由文本
doc_content_chars_max (int | None) – 文档内容长度的切割限制
kwargs (Any)
- load_and_split(text_splitter: TextSplitter | None = None) list[Document] #
加载文档并将其分割成块。块以文档形式返回。
不要重写此方法。它应该被视为已弃用!
- Parameters:
text_splitter (可选[TextSplitter]) – 用于分割文档的TextSplitter实例。 默认为RecursiveCharacterTextSplitter。
- Returns:
文档列表。
- Return type:
列表[Document]
使用 ArxivLoader 的示例