langchain_community.utilities.arxiv.ArxivAPIWrapper

class langchain_community.utilities.arxiv.ArxivAPIWrapper[source]

Bases: BaseModel

封装了ArxivAPI。

要使用,您应该已安装``arxiv`` python包。 https://lukasschwab.me/arxiv.py/index.html 此封装将使用Arxiv API 进行搜索并获取文档摘要。默认情况下,它将返回前k个结果的文档摘要。 如果查询以arxiv标识符的形式存在 (参见https://info.arxiv.org/help/find/index.html),它将返回与arxiv标识符对应的论文。 通过doc_content_chars_max限制文档内容。 如果不想限制内容大小,请将doc_content_chars_max设置为None。

属性:

top_k_results: 用于arxiv工具的前k个评分最高的文档数量 ARXIV_MAX_QUERY_LENGTH: 用于arxiv工具的查询的截断限制。 continue_on_failure (bool): 如果为True,在失败时继续加载其他URL。 load_max_docs: 加载文档数量的限制 load_all_available_meta:

如果为True: 加载的文档的“metadata”包含所有可用的元信息 (参见https://lukasschwab.me/arxiv.py/index.html#Result), 如果为False: “metadata”仅包含发布日期、标题、作者和摘要。

doc_content_chars_max: 文档内容长度的可选截断限制

示例:
from langchain_community.utilities.arxiv import ArxivAPIWrapper
arxiv = ArxivAPIWrapper(
    top_k_results = 3,
    ARXIV_MAX_QUERY_LENGTH = 300,
    load_max_docs = 3,
    load_all_available_meta = False,
    doc_content_chars_max = 40000
)
arxiv.run("tree of thought llm")

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param ARXIV_MAX_QUERY_LENGTH: int = 300
param arxiv_exceptions: Any = None
param continue_on_failure: bool = False
param doc_content_chars_max: Optional[int] = 4000
param load_all_available_meta: bool = False
param load_max_docs: int = 100
param top_k_results: int = 3
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

Parameters
  • _fields_set (Optional[SetStr]) –

  • values (Any) –

Return type

Model

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to include in new model

  • exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) – fields to exclude from new model, as with values this takes precedence over include

  • update (Optional[DictStrAny]) – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep (bool) – set to True to make a deep copy of the model

  • self (Model) –

Returns

new model instance

Return type

Model

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters
  • include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –

  • exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –

  • by_alias (bool) –

  • skip_defaults (Optional[bool]) –

  • exclude_unset (bool) –

  • exclude_defaults (bool) –

  • exclude_none (bool) –

Return type

DictStrAny

classmethod from_orm(obj: Any) Model
Parameters

obj (Any) –

Return type

Model

get_summaries_as_docs(query: str) List[Document][source]

执行arxiv搜索并返回文档列表,其中摘要作为内容。

如果发生错误或未找到文档,则返回错误文本。https://lukasschwab.me/arxiv.py/index.html#Search的包装器

参数:

query:纯文本搜索查询

Parameters

query (str) –

Return type

List[Document]

is_arxiv_identifier(query: str) bool[source]

检查查询是否为arXiv标识符。

Parameters

query (str) –

Return type

bool

json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

Parameters
  • include (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –

  • exclude (Optional[Union[AbstractSetIntStr, MappingIntStrAny]]) –

  • by_alias (bool) –

  • skip_defaults (Optional[bool]) –

  • exclude_unset (bool) –

  • exclude_defaults (bool) –

  • exclude_none (bool) –

  • encoder (Optional[Callable[[Any], Any]]) –

  • models_as_dict (bool) –

  • dumps_kwargs (Any) –

Return type

unicode

lazy_load(query: str) Iterator[Document][source]

运行Arxiv搜索并获取文章文本以及文章元信息。 参见https://lukasschwab.me/arxiv.py/index.html#Search

返回:文档以文本格式的document.page_content返回

执行Arxiv搜索,下载前k个结果作为PDF,将它们加载为Documents,并返回它们。

参数:

query: 明文搜索查询

Parameters

query (str) –

Return type

Iterator[Document]

load(query: str) List[Document][source]

运行Arxiv搜索并获取文章文本以及文章元信息。 请参阅https://lukasschwab.me/arxiv.py/index.html#Search

返回:包含文档页面内容的文本格式文档列表

执行Arxiv搜索,下载前k个结果作为PDF,将它们加载为文档,并以列表形式返回。

参数:

query:纯文本搜索查询

Parameters

query (str) –

Return type

List[Document]

classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
Parameters
  • path (Union[str, Path]) –

  • content_type (unicode) –

  • encoding (unicode) –

  • proto (Protocol) –

  • allow_pickle (bool) –

Return type

Model

classmethod parse_obj(obj: Any) Model
Parameters

obj (Any) –

Return type

Model

classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model
Parameters
  • b (Union[str, bytes]) –

  • content_type (unicode) –

  • encoding (unicode) –

  • proto (Protocol) –

  • allow_pickle (bool) –

Return type

Model

run(query: str) str[source]

执行arxiv搜索并返回一个字符串,其中包含每篇文章的发布日期、标题、作者和摘要,每篇文章之间用两个换行符分隔。

如果发生错误或未找到任何文档,则返回错误文本。这是https://lukasschwab.me/arxiv.py/index.html#Search的包装器。

参数:

query:一个纯文本搜索查询。

Parameters

query (str) –

Return type

str

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
Parameters
  • by_alias (bool) –

  • ref_template (unicode) –

Return type

DictStrAny

classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
Parameters
  • by_alias (bool) –

  • ref_template (unicode) –

  • dumps_kwargs (Any) –

Return type

unicode

classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Parameters

localns (Any) –

Return type

None

classmethod validate(value: Any) Model
Parameters

value (Any) –

Return type

Model

Examples using ArxivAPIWrapper