Steamship

SteamshipFileReader #

Bases: BaseReader

读取持久的Steamship文件并将其转换为文档。

Parameters:

Name	Type	Description	Default
`api_key`	`Optional[str]`	Steamship API密钥。如果未提供，默认为STEAMSHIP_API_KEY值。	`None`

注意

需要安装steamship包和一个有效的Steamship API密钥。要获取Steamship API密钥，请访问: https://steamship.com/account/api。一旦获得API密钥，请通过名为STEAMSHIP_API_KEY的环境变量公开它，或者将其作为初始化参数（api_key）传递。

Source code in llama_index/readers/steamship/base.py

class SteamshipFileReader(BaseReader):
    """读取持久的Steamship文件并将其转换为文档。

    Args:
        api_key: Steamship API密钥。如果未提供，默认为STEAMSHIP_API_KEY值。

    注意:
        需要安装`steamship`包和一个有效的Steamship API密钥。
        要获取Steamship API密钥，请访问: https://steamship.com/account/api。
        一旦获得API密钥，请通过名为`STEAMSHIP_API_KEY`的环境变量公开它，或者将其作为初始化参数（`api_key`）传递。"""

    def __init__(self, api_key: Optional[str] = None) -> None:
        """初始化阅读器。"""
        try:
            import steamship  # noqa

            self.api_key = api_key
        except ImportError:
            raise ImportError(
                "`steamship` must be installed to use the SteamshipFileReader.\n"
                "Please run `pip install --upgrade steamship."
            )

    def load_data(
        self,
        workspace: str,
        query: Optional[str] = None,
        file_handles: Optional[List[str]] = None,
        collapse_blocks: bool = True,
        join_str: str = "\n\n",
    ) -> List[Document]:
        """从持久Steamship文件中加载数据到文档中。

Args:
    workspace: Steamship工作区的句柄
        (参见: https://docs.steamship.com/workspaces/index.html)
    query: 用于检索文件的Steamship标签查询
        (例如: 'filetag and value("import-id")="import-001"')
    file_handles: Steamship文件句柄的列表
        (例如: `smooth-valley-9kbdr`)
    collapse_blocks: 是否将单独的文件块合并为单个文档，或者分开它们。
    join_str: 当collapse_blocks为True时，这是如何连接块文本的方式。

注意:
    来自`query`和`file_handles`的文件集合将被合并。目前不支持对集合进行解决冲突
    (这意味着如果一个文件既出现在查询结果集中，又作为file_handles中的句柄，它将被加载两次)。
"""
        from steamship import File, Steamship

        client = Steamship(workspace=workspace, api_key=self.api_key)
        files = []
        if query:
            files_from_query = File.query(client=client, tag_filter_query=query).files
            files.extend(files_from_query)

        if file_handles:
            files.extend([File.get(client=client, handle=h) for h in file_handles])

        docs = []
        for file in files:
            metadata = {"source": file.handle}

            for tag in file.tags:
                metadata[tag.kind] = tag.value

            if collapse_blocks:
                text = join_str.join([b.text for b in file.blocks])
                docs.append(Document(text=text, id_=file.handle, metadata=metadata))
            else:
                docs.extend(
                    [
                        Document(text=b.text, id_=file.handle, metadata=metadata)
                        for b in file.blocks
                    ]
                )

        return docs

load_data #

load_data(
    workspace: str,
    query: Optional[str] = None,
    file_handles: Optional[List[str]] = None,
    collapse_blocks: bool = True,
    join_str: str = "\n\n",
) -> List[Document]

从持久Steamship文件中加载数据到文档中。

Parameters:

Name	Type	Description	Default
`workspace`	`str`	Steamship工作区的句柄 (参见: https://docs.steamship.com/workspaces/index.html)	required
`query`	`Optional[str]`	用于检索文件的Steamship标签查询 (例如: 'filetag and value("import-id")="import-001"')	`None`
`file_handles`	`Optional[List[str]]`	Steamship文件句柄的列表 (例如: `smooth-valley-9kbdr`)	`None`
`collapse_blocks`	`bool`	是否将单独的文件块合并为单个文档，或者分开它们。	`True`
`join_str`	`str`	当collapse_blocks为True时，这是如何连接块文本的方式。	`'\n\n'`

注意

来自query和file_handles的文件集合将被合并。目前不支持对集合进行解决冲突 (这意味着如果一个文件既出现在查询结果集中，又作为file_handles中的句柄，它将被加载两次)。

Source code in llama_index/readers/steamship/base.py

    def load_data(
        self,
        workspace: str,
        query: Optional[str] = None,
        file_handles: Optional[List[str]] = None,
        collapse_blocks: bool = True,
        join_str: str = "\n\n",
    ) -> List[Document]:
        """从持久Steamship文件中加载数据到文档中。

Args:
    workspace: Steamship工作区的句柄
        (参见: https://docs.steamship.com/workspaces/index.html)
    query: 用于检索文件的Steamship标签查询
        (例如: 'filetag and value("import-id")="import-001"')
    file_handles: Steamship文件句柄的列表
        (例如: `smooth-valley-9kbdr`)
    collapse_blocks: 是否将单独的文件块合并为单个文档，或者分开它们。
    join_str: 当collapse_blocks为True时，这是如何连接块文本的方式。

注意:
    来自`query`和`file_handles`的文件集合将被合并。目前不支持对集合进行解决冲突
    (这意味着如果一个文件既出现在查询结果集中，又作为file_handles中的句柄，它将被加载两次)。
"""
        from steamship import File, Steamship

        client = Steamship(workspace=workspace, api_key=self.api_key)
        files = []
        if query:
            files_from_query = File.query(client=client, tag_filter_query=query).files
            files.extend(files_from_query)

        if file_handles:
            files.extend([File.get(client=client, handle=h) for h in file_handles])

        docs = []
        for file in files:
            metadata = {"source": file.handle}

            for tag in file.tags:
                metadata[tag.kind] = tag.value

            if collapse_blocks:
                text = join_str.join([b.text for b in file.blocks])
                docs.append(Document(text=text, id_=file.handle, metadata=metadata))
            else:
                docs.extend(
                    [
                        Document(text=b.text, id_=file.handle, metadata=metadata)
                        for b in file.blocks
                    ]
                )

        return docs