HFOnnx

pipeline

将 Hugging Face Transformer 模型导出为 ONNX。目前，这对于分类/池化/问答模型效果最佳。正在进行序列到序列模型（摘要、转录、翻译）的工作。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import HFOnnx, Labels

# 模型路径
path = "distilbert-base-uncased-finetuned-sst-2-english"

# 将模型导出为 ONNX
onnx = HFOnnx()
model = onnx(path, "text-classification", "model.onnx", True)

# 运行推理并验证
labels = Labels((model, path), dynamic=False)
labels("I am happy")

请参阅以下链接以获取更详细的示例。

Notebook	描述
使用 ONNX 导出并运行模型	使用 ONNX 导出模型，在 JavaScript、Java 和 Rust 中本地运行

方法

管道的 Python 文档。

`call(path, task='default', output=None, quantize=False, opset=14)`

Exports a Hugging Face Transformer model to ONNX.

Parameters:

Name	Description	Default
`path`	path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple	required
`task`	optional model task or category, determines the model type and outputs, defaults to export hidden state	`'default'`
`output`	optional output model path, defaults to return byte array if None	`None`
`quantize`	if model should be quantized (requires onnx to be installed), defaults to False	`False`
`opset`	onnx opset, defaults to 14	`14`

Returns:

Type	Description
	path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py

def __call__(self, path, task="default", output=None, quantize=False, opset=14):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
        task: optional model task or category, determines the model type and outputs, defaults to export hidden state
        output: optional output model path, defaults to return byte array if None
        quantize: if model should be quantized (requires onnx to be installed), defaults to False
        opset: onnx opset, defaults to 14

    Returns:
        path to model output or model as bytes depending on output parameter
    """

    inputs, outputs, model = self.parameters(task)

    if isinstance(path, (list, tuple)):
        model, tokenizer = path
        model = model.cpu()
    else:
        model = model(path)
        tokenizer = AutoTokenizer.from_pretrained(path)

    # Generate dummy inputs
    dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))

    # Default to BytesIO if no output file provided
    output = output if output else BytesIO()

    # Export model to ONNX
    export(
        model,
        (dummy,),
        output,
        opset_version=opset,
        do_constant_folding=True,
        input_names=list(inputs.keys()),
        output_names=list(outputs.keys()),
        dynamic_axes=dict(chain(inputs.items(), outputs.items())),
    )

    # Quantize model
    if quantize:
        if not ONNX_RUNTIME:
            raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')

        output = self.quantization(output)

    if isinstance(output, BytesIO):
        # Reset stream and return bytes
        output.seek(0)
        output = output.read()

    return output

HFOnnx

示例

方法

__call__(path, task='default', output=None, quantize=False, opset=14)

`call(path, task='default', output=None, quantize=False, opset=14)`