Transformers 文档

图像处理器的实用工具

Transformers

图像处理器的实用工具

本页面列出了图像处理器使用的所有实用函数，主要是用于处理图像的功能转换。

大多数这些内容只有在您研究库中图像处理器的代码时才有用。

图像变换

transformers.image_transforms.center_crop

( image: ndarray size: typing.Tuple[int, int] data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None return_numpy: typing.Optional[bool] = None ) → np.ndarray

参数

image (np.ndarray) — 要裁剪的图像.
size (Tuple[int, int]) — 裁剪图像的目标大小。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置，将使用输入图像的推断格式。
input_data_format (str 或 ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置，将使用输入图像的推断格式。
return_numpy (bool, 可选) — 是否将裁剪后的图像作为numpy数组返回。用于与之前的ImageFeatureExtractionMixin方法保持向后兼容。
- 未设置：将返回与输入图像相同的类型。
- True：将返回一个numpy数组。
- False：将返回一个PIL.Image.Image对象。

返回

np.ndarray

裁剪后的图像。

将image裁剪到指定的size，使用中心裁剪。请注意，如果图像太小无法裁剪到给定的大小，它将被填充（因此返回的结果将始终是size的大小）。

transformers.image_transforms.center_to_corners_format

( bboxes_center: TensorType )

将边界框从中心格式转换为角点格式。

中心格式：包含盒子中心的坐标及其宽度、高度尺寸 (center_x, center_y, width, height) 角落格式：包含盒子左上角和右下角的坐标 (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

transformers.image_transforms.corners_to_center_format

( bboxes_corners: TensorType )

将边界框从角点格式转换为中心格式。

角落格式：包含盒子左上角和右下角的坐标 (top_left_x, top_left_y, bottom_right_x, bottom_right_y) 中心格式：包含盒子中心的坐标及其宽度、高度尺寸 (center_x, center_y, width, height)

transformers.image_transforms.id_to_rgb

( id_map )

将唯一ID转换为RGB颜色。

transformers.image_transforms.normalize

( image: ndarray mean: typing.Union[float, typing.Iterable[float]] std: typing.Union[float, typing.Iterable[float]] data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None )

参数

image (np.ndarray) — 要标准化的图像。
mean (float 或 Iterable[float]) — 用于归一化的均值。
std (float or Iterable[float]) — 用于归一化的标准差。
data_format (ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，将使用从输入推断出的格式。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入推断出的格式。

使用mean和std指定的均值和标准差对image进行归一化。

image = (image - 平均值) / 标准差

transformers.image_transforms.pad

( image: ndarray padding: typing.Union[int, typing.Tuple[int, int], typing.Iterable[typing.Tuple[int, int]]] mode: PaddingMode = constant_values: typing.Union[float, typing.Iterable[float]] = 0.0 data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

图像 (np.ndarray) — 要填充的图像。
padding (int or Tuple[int, int] or Iterable[Tuple[int, int]]) — 应用于高度和宽度边缘的填充。可以是以下三种格式之一：
- ((before_height, after_height), (before_width, after_width)) 每个轴有独特的填充宽度。
- ((before, after),) 高度和宽度的前后填充相同。
- (pad,) 或 int 是所有轴的前后填充宽度相同的快捷方式。
mode (PaddingMode) — 使用的填充模式。可以是以下之一：
- "constant": 使用常数值填充。
- "reflect": 使用向量在沿每个轴的第一和最后一个值上的反射进行填充。
- "replicate": 使用沿每个轴的数组边缘的最后一个值的复制进行填充。
- "symmetric": 使用沿数组边缘的向量的反射进行填充。
constant_values (float 或 Iterable[float], 可选) — 如果 mode 是 "constant"，则用于填充的值。
data_format (str 或 ChannelDimension, 可选) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置，将使用与输入图像相同的格式。
input_data_format (str 或 ChannelDimension, 可选) — 输入图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。如果未设置，将使用输入图像的推断格式。

返回

np.ndarray

填充后的图像。

使用指定的（高度，宽度）padding和mode对image进行填充。

transformers.image_transforms.rgb_to_id

( 颜色 )

将RGB颜色转换为唯一ID。

transformers.image_transforms.rescale

( image: ndarray scale: float data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None dtype: dtype = input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

图像 (np.ndarray) — 要重新缩放的图像。
scale (float) — 用于重新缩放图像的缩放比例。
data_format (ChannelDimension, 可选) — 图像的通道维度格式。如果未提供，将与输入图像相同。
dtype (np.dtype, 可选, 默认为 np.float32) — 输出图像的数据类型。默认为 np.float32。用于与特征提取器的向后兼容性。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未提供，将从输入图像中推断。

返回

np.ndarray

重新调整大小的图像。

将image按scale重新缩放。

transformers.image_transforms.resize

( image: ndarray size: typing.Tuple[int, int] resample: PILImageResampling = None reducing_gap: typing.Optional[int] = None data_format: typing.Optional[transformers.image_utils.ChannelDimension] = None return_numpy: bool = True input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → np.ndarray

参数

image (np.ndarray) — 要调整大小的图像。
size (Tuple[int, int]) — 用于调整图像大小的尺寸。
resample (int, optional, defaults to PILImageResampling.BILINEAR) — 用于重新采样的过滤器。
reducing_gap (int, 可选) — 通过两步调整图像大小来应用优化。reducing_gap 越大，结果越接近公平重采样。有关更多详细信息，请参阅相应的 Pillow 文档。
data_format (ChannelDimension, 可选) — 输出图像的通道维度格式。如果未设置，将使用从输入推断出的格式。
return_numpy (bool, 可选, 默认为 True) — 是否将调整大小后的图像作为numpy数组返回。如果为False，则返回一个PIL.Image.Image对象。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入中推断出的格式。

返回

np.ndarray

调整大小后的图像。

使用PIL库将image调整为size指定的(height, width)。

transformers.image_transforms.to_pil_image

( image: typing.Union[numpy.ndarray, ForwardRef('PIL.Image.Image'), ForwardRef('torch.Tensor'), ForwardRef('tf.Tensor'), ForwardRef('jnp.ndarray')] do_rescale: typing.Optional[bool] = None image_mode: typing.Optional[str] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None ) → PIL.Image.Image

参数

图像 (PIL.Image.Image 或 numpy.ndarray 或 torch.Tensor 或 tf.Tensor) — 要转换为 PIL.Image 格式的图像。
do_rescale (bool, optional) — 是否应用缩放因子（使像素值在0到255之间的整数）。如果图像类型是浮点类型并且转换为int会导致精度损失，则默认为True，否则为False。
image_mode (str, optional) — 用于PIL图像的模式。如果未设置，将使用输入图像类型的默认模式。
input_data_format (ChannelDimension, 可选) — 输入图像的通道维度格式。如果未设置，将使用从输入中推断出的格式。

返回

PIL.Image.Image

转换后的图像。

将image转换为PIL图像。如果需要，可以选择重新缩放并将通道维度放回最后一个轴。

图像处理混合

类 transformers.ImageProcessingMixin

( **kwargs )

这是一个图像处理器混入，用于为序列和图像特征提取器提供保存/加载功能。

fetch_images

( image_url_or_urls: typing.Union[str, typing.List[str]] )

将单个或一组URL转换为相应的PIL.Image对象。

如果传递单个URL，返回值将是一个单独的对象。如果传递一个列表，将返回一个对象列表。

from_dict

( image_processor_dict: typing.Dict[str, typing.Any] **kwargs ) → ImageProcessingMixin

参数

image_processor_dict (Dict[str, Any]) — 用于实例化图像处理器对象的字典。可以通过利用 to_dict() 方法从预训练检查点中检索到这样的字典。
kwargs (Dict[str, Any]) — 用于初始化图像处理器对象的附加参数。

返回

ImageProcessingMixin

从这些参数实例化的图像处理器对象。

从Python参数字典实例化一种ImageProcessingMixin类型。

from_json_file

( json_file: typing.Union[str, os.PathLike] ) → 一个类型为 ImageProcessingMixin 的图像处理器

参数

json_file (str or os.PathLike) — 包含参数的JSON文件的路径。

返回

一个类型为 ImageProcessingMixin 的图像处理器

从该JSON文件实例化的image_processor对象。

从参数JSON文件的路径实例化一个类型为ImageProcessingMixin的图像处理器。

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 这可以是以下之一：
- 一个字符串，表示托管在 huggingface.co 上的模型仓库中的预训练图像处理器的 模型 id。
- 一个路径，指向使用 save_pretrained() 方法保存的图像处理器文件的目录，例如， ./my_model_directory/。
- 一个路径或 URL，指向保存的图像处理器 JSON 文件，例如， ./my_model_directory/preprocessor_config.json。
cache_dir (str 或 os.PathLike, 可选) — 如果不使用标准缓存，则应缓存下载的预训练模型图像处理器的目录路径。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载图像处理器文件并覆盖缓存版本（如果存在）。
resume_download — 已弃用并被忽略。现在默认情况下，所有下载在可能的情况下都会自动恢复。将在Transformers的v5版本中移除。
proxies (Dict[str, str], 可选) — 一个按协议或端点使用的代理服务器字典，例如 {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. 这些代理在每个请求中使用。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True 或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。
revision (str, 可选, 默认为 "main") — 要使用的特定模型版本。它可以是分支名称、标签名称或提交ID，因为我们使用基于git的系统在huggingface.co上存储模型和其他工件，所以revision可以是git允许的任何标识符。

从图像处理器实例化一种ImageProcessingMixin类型。

示例：

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

get_image_processor_dict

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] **kwargs ) → Tuple[Dict, Dict]

参数

pretrained_model_name_or_path (str or os.PathLike) — 我们想要从中获取参数字典的预训练检查点的标识符。
子文件夹 (str, 可选, 默认为 "") — 如果相关文件位于 huggingface.co 上的模型仓库的子文件夹中，您可以在此处指定文件夹名称。

返回

Tuple[Dict, Dict]

将用于实例化图像处理器对象的字典。

从一个pretrained_model_name_or_path，解析出一组参数，用于通过from_dict实例化一个类型为~image_processor_utils.ImageProcessingMixin的图像处理器。

push_to_hub

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — 您想要推送图像处理器的仓库名称。当推送到特定组织时，它应包含您的组织名称。
use_temp_dir (bool, 可选) — 是否使用临时目录来存储推送到 Hub 之前保存的文件。如果没有名为 repo_id 的目录，则默认为 True，否则为 False。
commit_message (str, 可选) — 推送时提交的消息。默认为 "Upload image processor".
private (bool, 可选) — 是否将仓库设为私有。如果为 None（默认值），仓库将为公开，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "5GB") — 仅适用于模型。分片前检查点的最大大小。分片后的检查点大小将小于此大小。如果以字符串形式表示，需要是数字后跟单位（如 "5MB"）。我们默认将其设置为 "5GB"，以便用户可以在免费层级的 Google Colab 实例上轻松加载模型，而不会出现 CPU 内存不足的问题。
create_pr (bool, optional, defaults to False) — 是否创建一个带有上传文件的PR或直接提交。
safe_serialization (bool, optional, defaults to True) — 是否将模型权重转换为safetensors格式以实现更安全的序列化。
revision (str, optional) — 将上传的文件推送到的分支。
commit_description (str, optional) — 将要创建的提交的描述
标签 (List[str], 可选) — 推送到Hub的标签列表。

将图像处理器文件上传到🤗模型中心。

示例：

from transformers import AutoImageProcessor

image processor = AutoImageProcessor.from_pretrained("google-bert/bert-base-cased")

# Push the image processor to your namespace with the name "my-finetuned-bert".
image processor.push_to_hub("my-finetuned-bert")

# Push the image processor to an organization with the name "my-finetuned-bert".
image processor.push_to_hub("huggingface/my-finetuned-bert")

register_for_auto_class

( auto_class = 'AutoImageProcessor' )

参数

auto_class (str 或 type, 可选, 默认为 "AutoImageProcessor ") — 用于注册此新图像处理器的自动类。

将此类注册到给定的自动类。这仅应用于自定义图像处理器，因为库中的图像处理器已经通过AutoImageProcessor进行了映射。

此API是实验性的，在接下来的版本中可能会有一些轻微的破坏性更改。

save_pretrained

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

参数

save_directory (str or os.PathLike) — 图像处理器 JSON 文件将被保存的目录（如果不存在将被创建）。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存后将模型推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中的 save_directory 名称）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的额外关键字参数。

将图像处理器对象保存到目录 save_directory 中，以便可以使用 from_pretrained() 类方法重新加载。

to_dict

( ) → Dict[str, Any]

返回

Dict[str, Any]

构成此图像处理器实例的所有属性的字典。

将此实例序列化为Python字典。

to_json_file

( json_file_path: typing.Union[str, os.PathLike] )

参数

json_file_path (str or os.PathLike) — 保存此图像处理器实例参数的JSON文件的路径。

将此实例保存到JSON文件。

to_json_string

( ) → str

返回

str

包含构成此feature_extractor实例的所有属性的字符串，以JSON格式表示。

将此实例序列化为JSON字符串。

< > Update on GitHub

←Utilities for Generation Utilities for Audio processing→