Transformers 文档

用于推理的管道

Transformers

推理管道

pipeline() 使得使用来自 Hub 的任何模型进行推理变得简单，适用于任何语言、计算机视觉、语音和多模态任务。即使您没有特定模态的经验或不熟悉模型背后的底层代码，您仍然可以使用 pipeline() 进行推理！本教程将教您：

使用pipeline()进行推理。
使用特定的分词器或模型。
使用pipeline()进行音频、视觉和多模态任务。

查看pipeline()文档，了解支持的任务和可用参数的完整列表。

管道使用

虽然每个任务都有一个关联的pipeline()，但使用包含所有任务特定管道的通用pipeline()抽象更为简单。pipeline()会自动加载一个默认模型和一个能够为您的任务进行推理的预处理类。让我们以使用pipeline()进行自动语音识别（ASR）或语音转文本为例。

首先创建一个pipeline()并指定推理任务：

>>> from transformers import pipeline

>>> transcriber = pipeline(task="automatic-speech-recognition")

将您的输入传递给pipeline()。在语音识别的情况下，这是一个音频输入文件：

>>> transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': 'I HAVE A DREAM BUT ONE DAY THIS NATION WILL RISE UP LIVE UP THE TRUE MEANING OF ITS TREES'}

这不是你想要的结果吗？查看一些在Hub上下载量最多的自动语音识别模型，看看是否能获得更好的转录结果。

让我们尝试一下OpenAI的Whisper large-v2模型。Whisper比Wav2Vec2晚发布2年，并且训练数据量接近Wav2Vec2的10倍。因此，它在大多数下游基准测试中胜过Wav2Vec2。它还有一个额外的好处，就是可以预测标点符号和大小写，而这两者在Wav2Vec2中都是不可能的。

让我们在这里尝试一下，看看它的表现如何。设置torch_dtype="auto"以自动加载权重存储的最节省内存的数据类型。

>>> transcriber = pipeline(model="openai/whisper-large-v2", torch_dtype="auto")
>>> transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

现在这个结果看起来更准确了！要深入了解Wav2Vec2与Whisper的比较，请参考音频变换器课程。我们真的鼓励您查看Hub上不同语言的模型、专门针对您领域的模型等。您可以直接在Hub上从浏览器中查看和比较模型结果，看看它是否比其他模型更适合或更好地处理边缘情况。如果您没有找到适合您用例的模型，您可以随时开始训练您自己的模型！

如果你有多个输入，你可以将你的输入作为列表传递：

transcriber(
    [
        "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac",
        "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac",
    ]
)

管道非常适合实验，因为从一个模型切换到另一个模型非常简单；然而，有一些方法可以优化它们以处理比实验更大的工作负载。请参阅以下指南，深入了解如何在整个数据集上进行迭代或在网络服务器中使用管道：文档中的内容：

参数

pipeline() 支持许多参数；有些是任务特定的，有些是所有管道通用的。一般来说，您可以在任何地方指定参数：

transcriber = pipeline(model="openai/whisper-large-v2", my_parameter=1)

out = transcriber(...)  # This will use `my_parameter=1`.
out = transcriber(..., my_parameter=2)  # This will override and use `my_parameter=2`.
out = transcriber(...)  # This will go back to using `my_parameter=1`.

让我们来看看3个重要的：

设备

如果你使用device=n，管道会自动将模型放在指定的设备上。无论你使用的是PyTorch还是Tensorflow，这都将有效。

transcriber = pipeline(model="openai/whisper-large-v2", device=0)

如果模型对于单个GPU来说太大，并且你正在使用PyTorch，你可以设置torch_dtype='float16'来启用FP16精度推理。通常这不会导致显著的性能下降，但请确保在你的模型上评估它！

或者，你可以设置 device_map="auto" 来自动确定如何加载和存储模型权重。使用 device_map 参数需要 🤗 Accelerate 包：

pip install --upgrade accelerate

以下代码自动加载并存储跨设备的模型权重：

transcriber = pipeline(model="openai/whisper-large-v2", device_map="auto")

请注意，如果传递了device_map="auto"，则在实例化pipeline时无需添加参数device=device，否则可能会遇到一些意外行为！

批量大小

默认情况下，管道不会进行批量推理，原因在这里有详细解释。原因是批量处理不一定更快，在某些情况下实际上可能会更慢。

但如果它在你的使用场景中有效，你可以使用：

transcriber = pipeline(model="openai/whisper-large-v2", device=0, batch_size=2)
audio_filenames = [f"https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/{i}.flac" for i in range(1, 5)]
texts = transcriber(audio_filenames)

这将在提供的4个音频文件上运行管道，但它会将这些文件分成每批2个传递给模型（模型在GPU上运行，批处理更有可能有所帮助），而无需您编写任何额外的代码。输出应始终与不使用批处理时得到的结果一致。这仅作为一种帮助您从管道中获得更多速度的方式。

管道还可以减轻批处理的一些复杂性，因为对于某些管道，单个项目（如长音频文件）需要分成多个部分才能由模型处理。管道会为您执行这种分块批处理。

任务特定参数

所有任务都提供了特定于任务的参数，这些参数提供了额外的灵活性和选项，以帮助您完成工作。例如，transformers.AutomaticSpeechRecognitionPipeline.call() 方法有一个 return_timestamps 参数，这对于视频字幕生成来说听起来很有前景：

>>> transcriber = pipeline(model="openai/whisper-large-v2", return_timestamps=True)
>>> transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.', 'chunks': [{'timestamp': (0.0, 11.88), 'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its'}, {'timestamp': (11.88, 12.38), 'text': ' creed.'}]}

如你所见，模型推断出了文本，并且还输出了何时发出了各种句子。

每个任务都有许多可用的参数，因此请查看每个任务的API参考，看看你可以调整哪些内容！例如，AutomaticSpeechRecognitionPipeline有一个chunk_length_s参数，这对于处理模型通常无法单独处理的非常长的音频文件（例如，为整部电影或长达一小时的视频添加字幕）非常有用：

>>> transcriber = pipeline(model="openai/whisper-large-v2", chunk_length_s=30)
>>> transcriber("https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/ted_60.wav")
{'text': " So in college, I was a government major, which means I had to write a lot of papers. Now, when a normal student writes a paper, they might spread the work out a little like this. So, you know. You get started maybe a little slowly, but you get enough done in the first week that with some heavier days later on, everything gets done and things stay civil. And I would want to do that like that. That would be the plan. I would have it all ready to go, but then actually the paper would come along, and then I would kind of do this. And that would happen every single paper. But then came my 90-page senior thesis, a paper you're supposed to spend a year on. I knew for a paper like that, my normal workflow was not an option, it was way too big a project. So I planned things out and I decided I kind of had to go something like this. This is how the year would go. So I'd start off light and I'd bump it up"}

如果你找不到一个真正能帮助你的参数，请随时请求它！

在数据集上使用管道

管道还可以在大型数据集上运行推理。我们推荐的最简单方法是使用迭代器：

def data():
    for i in range(1000):
        yield f"My example {i}"


pipe = pipeline(model="openai-community/gpt2", device=0)
generated_characters = 0
for out in pipe(data()):
    generated_characters += len(out[0]["generated_text"])

迭代器 data() 生成每个结果，管道会自动识别输入是可迭代的，并会在继续在GPU上处理数据的同时开始获取数据（这使用了DataLoader 在底层）。这一点很重要，因为你不需要为整个数据集分配内存，并且可以尽可能快地为GPU提供数据。

由于批处理可以加快速度，尝试调整batch_size参数可能是有用的。

遍历数据集的最简单方法是从🤗 Datasets加载一个：

# KeyDataset is a util that will just output the item we're interested in.
from transformers.pipelines.pt_utils import KeyDataset
from datasets import load_dataset

pipe = pipeline(model="hf-internal-testing/tiny-random-wav2vec2", device=0)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:10]")

for out in pipe(KeyDataset(dataset, "audio")):
    print(out)

使用管道为Web服务器

Creating an inference engine is a complex topic which deserves it's own page.

Link

视觉管道

使用pipeline()进行视觉任务几乎完全相同。

指定您的任务并将您的图像传递给分类器。图像可以是链接、本地路径或base64编码的图像。例如，下面显示的猫是什么品种？

管道-猫-胖胖

>>> from transformers import pipeline

>>> vision_classifier = pipeline(model="google/vit-base-patch16-224")
>>> preds = vision_classifier(
...     images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
... )
>>> preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
>>> preds
[{'score': 0.4335, 'label': 'lynx, catamount'}, {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}, {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}, {'score': 0.0239, 'label': 'Egyptian cat'}, {'score': 0.0229, 'label': 'tiger cat'}]

文本管道

使用pipeline()进行NLP任务几乎完全相同。

>>> from transformers import pipeline

>>> # This model is a `zero-shot-classification` model.
>>> # It will classify text, except you are free to choose any label you might imagine
>>> classifier = pipeline(model="facebook/bart-large-mnli")
>>> classifier(
...     "I have a problem with my iphone that needs to be resolved asap!!",
...     candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
... )
{'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'], 'scores': [0.504, 0.479, 0.013, 0.003, 0.002]}

多模态管道

pipeline() 支持多种模态。例如，视觉问答（VQA）任务结合了文本和图像。请随意使用您喜欢的任何图像链接以及您想询问图像的问题。图像可以是URL或图像的本地路径。

例如，如果您使用此发票图片：

>>> from transformers import pipeline

>>> vqa = pipeline(model="impira/layoutlm-document-qa")
>>> output = vqa(
...     image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png",
...     question="What is the invoice number?",
... )
>>> output[0]["score"] = round(output[0]["score"], 3)
>>> output
[{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]

要运行上面的示例，除了🤗 Transformers之外，你还需要安装pytesseract：

sudo apt install -y tesseract-ocr
pip install pytesseract

在大型模型上使用 🤗 accelerate 的 pipeline：

你可以轻松地在大型模型上运行pipeline，使用🤗 accelerate！首先确保你已经通过pip install accelerate安装了accelerate。

首先使用device_map="auto"加载您的模型！我们将使用facebook/opt-1.3b作为我们的示例。

# pip install accelerate
import torch
from transformers import pipeline

pipe = pipeline(model="facebook/opt-1.3b", torch_dtype=torch.bfloat16, device_map="auto")
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)

如果你安装了bitsandbytes并添加参数load_in_8bit=True，你也可以传递8位加载的模型

# pip install accelerate bitsandbytes
import torch
from transformers import pipeline

pipe = pipeline(model="facebook/opt-1.3b", device_map="auto", model_kwargs={"load_in_8bit": True})
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)

请注意，您可以将检查点替换为任何支持大模型加载的Hugging Face模型，例如BLOOM。

使用gradio从管道创建网页演示

在Gradio中自动支持管道，这是一个使在网络上创建美观且用户友好的机器学习应用程序变得轻而易举的库。首先，确保你已经安装了Gradio：

pip install gradio

然后，您可以通过调用Gradio的Interface.from_pipeline函数，用一行代码围绕图像分类管道（或任何其他管道）创建一个Web演示。这将在您的浏览器中创建一个直观的拖放界面：

from transformers import pipeline
import gradio as gr

pipe = pipeline("image-classification", model="google/vit-base-patch16-224")

gr.Interface.from_pipeline(pipe).launch()

默认情况下，Web演示在本地服务器上运行。如果您想与他人分享，可以通过在launch()中设置share=True来生成一个临时的公共链接。您还可以将您的演示托管在Hugging Face Spaces上以获得永久链接。

< > Update on GitHub

←Adding a new model to `transformers` Write portable code with AutoClass→