MarkdownHeaderTextSplitter#

class langchain_text_splitters.markdown.MarkdownHeaderTextSplitter(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]#

根据指定的标题分割Markdown文件。

创建一个新的MarkdownHeaderTextSplitter。

Parameters:

headers_to_split_on (List[Tuple[str, str]]) – 我们想要跟踪的头部信息
return_each_line (bool) – 返回每行及其关联的头部信息
strip_headers (bool) – 从块的内容中去除分割头信息

方法

`__init__`(headers_to_split_on[, ...])	创建一个新的MarkdownHeaderTextSplitter。
`aggregate_lines_to_chunks`(lines)	将具有共同元数据的行组合成块。
`split_text`(text)	分割markdown文件。

__init__(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]#

创建一个新的MarkdownHeaderTextSplitter。

Parameters:

headers_to_split_on (List[Tuple[str, str]]) – 我们想要跟踪的头部信息
return_each_line (bool) – 返回每行及其关联的头部信息
strip_headers (bool) – 从块的内容中去除分割头信息

aggregate_lines_to_chunks(lines: List[LineType]) → List[Document][source]#

将具有共同元数据的行合并成块。

Parameters:: lines (List[LineType]) – 文本行 / 相关的标题元数据
Return type:: 列表[文档]

split_text(text: str) → List[Document][source]#

分割Markdown文件。

Parameters:: 文本 (字符串) – Markdown 文件
Return type:: 列表[文档]

使用 MarkdownHeaderTextSplitter 的示例

如何按标题拆分Markdown