Skip to content

HFOnnx

pipeline pipeline

将 Hugging Face Transformer 模型导出为 ONNX。目前,这对于分类/池化/问答模型效果最佳。正在进行序列到序列模型(摘要、转录、翻译)的工作。

示例

以下展示了一个使用此管道的简单示例。

from txtai.pipeline import HFOnnx, Labels

# 模型路径
path = "distilbert-base-uncased-finetuned-sst-2-english"

# 将模型导出为 ONNX
onnx = HFOnnx()
model = onnx(path, "text-classification", "model.onnx", True)

# 运行推理并验证
labels = Labels((model, path), dynamic=False)
labels("I am happy")

请参阅以下链接以获取更详细的示例。

Notebook 描述
使用 ONNX 导出并运行模型 使用 ONNX 导出模型,在 JavaScript、Java 和 Rust 中本地运行 在 Colab 中打开

方法

管道的 Python 文档。

__call__(path, task='default', output=None, quantize=False, opset=14)

Exports a Hugging Face Transformer model to ONNX.

Parameters:

Name Type Description Default
path

path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple

required
task

optional model task or category, determines the model type and outputs, defaults to export hidden state

'default'
output

optional output model path, defaults to return byte array if None

None
quantize

if model should be quantized (requires onnx to be installed), defaults to False

False
opset

onnx opset, defaults to 14

14

Returns:

Type Description

path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def __call__(self, path, task="default", output=None, quantize=False, opset=14):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
        task: optional model task or category, determines the model type and outputs, defaults to export hidden state
        output: optional output model path, defaults to return byte array if None
        quantize: if model should be quantized (requires onnx to be installed), defaults to False
        opset: onnx opset, defaults to 14

    Returns:
        path to model output or model as bytes depending on output parameter
    """

    inputs, outputs, model = self.parameters(task)

    if isinstance(path, (list, tuple)):
        model, tokenizer = path
        model = model.cpu()
    else:
        model = model(path)
        tokenizer = AutoTokenizer.from_pretrained(path)

    # Generate dummy inputs
    dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))

    # Default to BytesIO if no output file provided
    output = output if output else BytesIO()

    # Export model to ONNX
    export(
        model,
        (dummy,),
        output,
        opset_version=opset,
        do_constant_folding=True,
        input_names=list(inputs.keys()),
        output_names=list(outputs.keys()),
        dynamic_axes=dict(chain(inputs.items(), outputs.items())),
    )

    # Quantize model
    if quantize:
        if not ONNX_RUNTIME:
            raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')

        output = self.quantization(output)

    if isinstance(output, BytesIO):
        # Reset stream and return bytes
        output.seek(0)
        output = output.read()

    return output