表格

pipeline

表格管道将表格数据拆分为行和列。表格管道在创建（id, 文本, 标签）元组以加载到嵌入索引中最有用。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import Tabular

# 创建并运行管道
tabular = Tabular("id", ["text"])
tabular("path to csv file")

请参阅下面的链接以获取更详细的示例。

笔记本	描述
使用可组合工作流转换表格数据	转换、索引和搜索表格数据

配置驱动示例

管道可以通过 Python 或配置运行。管道可以使用管道的小写名称在配置中实例化。配置驱动的管道可以通过工作流或API运行。

config.yml

# 使用小写类名创建管道
tabular:
    idcolumn: id
    textcolumns:
      - text

# 使用工作流运行管道
workflow:
  tabular:
    tasks:
      - action: tabular

使用工作流运行

from txtai import Application

# 使用工作流创建并运行管道
app = Application("config.yml")
list(app.workflow("tabular", ["path to csv file"]))

使用 API 运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tabular", "elements":["path to csv file"]}'

方法

管道的 Python 文档。

`init(idcolumn=None, textcolumns=None, content=False)`

Creates a new Tabular pipeline.

Parameters:

Name	Description	Default
`idcolumn`	column name to use for row id	`None`
`textcolumns`	list of columns to combine as a text field	`None`
`content`	if True, a dict per row is generated with all fields. If content is a list, a subset of fields is included in the generated rows.	`False`

Source code in txtai/pipeline/data/tabular.py

def __init__(self, idcolumn=None, textcolumns=None, content=False):
    """
    Creates a new Tabular pipeline.

    Args:
        idcolumn: column name to use for row id
        textcolumns: list of columns to combine as a text field
        content: if True, a dict per row is generated with all fields. If content is a list, a subset of fields
                 is included in the generated rows.
    """

    if not PANDAS:
        raise ImportError('Tabular pipeline is not available - install "pipeline" extra to enable')

    self.idcolumn = idcolumn
    self.textcolumns = textcolumns
    self.content = content

`call(data)`

Splits data into rows and columns.