Doctran属性提取器#

class langchain_community.document_transformers.doctran_text_extract.DoctranPropertyExtractor(properties: List[dict], openai_api_key: str | None = None, openai_api_model: str | None = None)[source]#

使用doctran从文本文档中提取属性。

Parameters:
  • properties (List[dict]) – 要提取的属性列表。

  • openai_api_key (str | None) – OpenAI API 密钥。也可以通过环境变量 OPENAI_API_KEY 指定。

  • openai_api_model (str | None)

示例

from langchain_community.document_transformers import DoctranPropertyExtractor

properties = [
    {
        "name": "category",
        "description": "What type of email this is.",
        "type": "string",
        "enum": ["update", "action_item", "customer_feedback", "announcement", "other"],
        "required": True,
    },
    {
        "name": "mentions",
        "description": "A list of all people mentioned in this email.",
        "type": "array",
        "items": {
            "name": "full_name",
            "description": "The full name of the person mentioned.",
            "type": "string",
        },
        "required": True,
    },
    {
        "name": "eli5",
        "description": "Explain this email to me like I'm 5 years old.",
        "type": "string",
        "required": True,
    },
]

# Pass in openai_api_key or set env var OPENAI_API_KEY
property_extractor = DoctranPropertyExtractor(properties)
transformed_document = await qa_transformer.atransform_documents(documents)

方法

__init__(properties[, openai_api_key, ...])

atransform_documents(documents, **kwargs)

使用doctran从文本文档中提取属性。

transform_documents(documents, **kwargs)

使用doctran从文本文档中提取属性。

__init__(properties: List[dict], openai_api_key: str | None = None, openai_api_model: str | None = None) None[source]#
Parameters:
  • properties (List[dict])

  • openai_api_key (str | None)

  • openai_api_model (str | None)

Return type:

async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document][source]#

使用doctran从文本文档中提取属性。

Parameters:
  • 文档 (序列[Document])

  • kwargs (Any)

Return type:

序列[文档]

transform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document][source]#

使用doctran从文本文档中提取属性。

Parameters:
  • 文档 (序列[Document])

  • kwargs (Any)

Return type:

序列[文档]

使用 DoctranPropertyExtractor 的示例