Transformers 文档

图像处理器

Transformers

图像处理器

图像处理器负责为视觉模型准备输入特征并对其输出进行后处理。这包括调整大小、归一化以及转换为PyTorch、TensorFlow、Flax和Numpy张量等转换。它还可能包括模型特定的后处理，例如将逻辑转换为分割掩码。

快速图像处理器适用于少数模型，未来将添加更多。它们基于torchvision库，并提供了显著的加速，尤其是在GPU上处理时。它们具有与基础图像处理器相同的API，并且可以直接替换使用。要使用快速图像处理器，您需要安装torchvision库，并在实例化图像处理器时将use_fast参数设置为True：

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)

使用快速图像处理器时，您还可以设置device参数来指定处理应在哪个设备上进行。默认情况下，如果输入是张量，则处理在与输入相同的设备上进行，否则在CPU上进行。

from torchvision.io import read_image
from transformers import DetrImageProcessorFast

images = read_image("image.jpg")
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
images_processed = processor(images, return_tensors="pt", device="cuda")

以下是DETR和RT-DETR模型的基础和快速图像处理器之间的速度比较，以及它们如何影响整体推理时间：

这些基准测试是在AWS EC2 g5.2xlarge实例上运行的，使用了NVIDIA A10G Tensor Core GPU。

图像处理混合

类 transformers.ImageProcessingMixin

< source >

( **kwargs )

这是一个图像处理器混入，用于为序列和图像特征提取器提供保存/加载功能。

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 这可以是以下之一：
- 一个字符串，表示托管在 huggingface.co 上的模型仓库中的预训练图像处理器的 模型 id。
- 一个路径，指向使用 save_pretrained() 方法保存的图像处理器文件的目录，例如， ./my_model_directory/。
- 一个路径或 URL，指向保存的图像处理器 JSON 文件，例如， ./my_model_directory/preprocessor_config.json。
cache_dir (str 或 os.PathLike, 可选) — 如果不使用标准缓存，则应缓存下载的预训练模型图像处理器的目录路径。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载图像处理器文件并覆盖缓存版本（如果存在）。
resume_download — 已弃用并被忽略。现在默认情况下，所有下载在可能的情况下都会自动恢复。将在Transformers的v5版本中移除。
proxies (Dict[str, str], 可选) — 一个按协议或端点使用的代理服务器字典，例如 {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. 这些代理在每个请求中使用。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True 或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。
revision (str, 可选, 默认为 "main") — 要使用的特定模型版本。它可以是分支名称、标签名称或提交ID，因为我们使用基于git的系统在huggingface.co上存储模型和其他工件，所以revision可以是git允许的任何标识符。

从图像处理器实例化一种ImageProcessingMixin类型。

示例：

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

参数

save_directory (str or os.PathLike) — 图像处理器 JSON 文件将被保存的目录（如果不存在将被创建）。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存后将模型推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中的 save_directory 名称）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的额外关键字参数。

将图像处理器对象保存到目录 save_directory 中，以便可以使用 from_pretrained() 类方法重新加载。

批量特征

类 transformers.BatchFeature

< source >

( data: typing.Optional[typing.Dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

参数

data (dict, 可选) — 由 call/pad 方法返回的列表/数组/张量的字典（‘input_values’, ‘attention_mask’, 等）。
tensor_type (Union[None, str, TensorType], 可选) — 你可以在这里提供一个tensor_type，以便在初始化时将整数列表转换为PyTorch/TensorFlow/Numpy张量。

保存pad()和特征提取器特定的__call__方法的输出。

这个类是从Python字典派生的，可以用作字典。

convert_to_tensors

< source >

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

参数

tensor_type (str 或 TensorType, 可选) — 使用的张量类型。如果是 str，应该是枚举 TensorType 的值之一。如果是 None，则不进行任何修改。

将内部内容转换为张量。

到

< source >

( *args **kwargs ) → BatchFeature

参数

args (Tuple) — 将被传递给张量的to(...)函数。
kwargs (Dict, 可选) — 将被传递给张量的 to(...) 函数。

BatchFeature

修改后的相同实例。

通过调用 v.to(*args, **kwargs)（仅限PyTorch）将所有值发送到设备。这应该支持在不同的 dtypes 中进行类型转换，并将 BatchFeature 发送到不同的 device。

BaseImageProcessor

类 transformers.BaseImageProcessor

< source >

( **kwargs )

center_crop

< source >

( image: ndarray size: typing.Dict[str, int] data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs )

参数

图像 (np.ndarray) — 要中心裁剪的图像.
size (Dict[str, int]) — 输出图像的大小。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。

将图像中心裁剪为(size["height"], size["width"])。如果输入尺寸在任何一边小于crop_size，则图像会用0填充，然后进行中心裁剪。

normalize

< source >

( image: ndarray mean: typing.Union[float, typing.Iterable[float]] std: typing.Union[float, typing.Iterable[float]] data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs ) → np.ndarray

参数

图像 (np.ndarray) — 要标准化的图像.
mean (float or Iterable[float]) — 用于归一化的图像均值。
std (float or Iterable[float]) — 用于归一化的图像标准差.
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。

np.ndarray

归一化的图像。

归一化图像。image = (image - image_mean) / image_std。

重新缩放

< source >

( image: ndarray scale: float data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs ) → np.ndarray

参数

image (np.ndarray) — 要重新缩放的图像。
scale (float) — 用于重新缩放像素值的缩放因子。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，则使用输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。

np.ndarray

重新调整大小的图像。

按比例因子重新缩放图像。image = image * scale。

BaseImageProcessorFast

类 transformers.BaseImageProcessorFast

< source >

( **kwargs )

< > Update on GitHub

←Feature Extractor ALBERT→