Transformers 文档

超级点

Transformers

SuperPoint

概述

SuperPoint模型是由Daniel DeTone、Tomasz Malisiewicz和Andrew Rabinovich在SuperPoint: Self-Supervised Interest Point Detection and Description中提出的。

该模型是通过自监督训练得到的全卷积网络，用于兴趣点检测和描述。该模型能够检测在单应变换下可重复的兴趣点，并为每个点提供描述符。该模型单独使用时功能有限，但可以作为特征提取器用于其他任务，如单应性估计、图像匹配等。

论文的摘要如下：

本文提出了一种自监督框架，用于训练适用于计算机视觉中大量多视图几何问题的兴趣点检测器和描述符。与基于补丁的神经网络不同，我们的全卷积模型在全尺寸图像上运行，并在一次前向传递中联合计算像素级兴趣点位置和相关描述符。我们引入了同构适应，一种多尺度、多同构的方法，用于提高兴趣点检测的重复性并执行跨域适应（例如，从合成到真实）。我们的模型在使用同构适应在MS-COCO通用图像数据集上训练时，能够比初始未适应的深度模型和任何其他传统角点检测器重复检测到更丰富的兴趣点集。与LIFT、SIFT和ORB相比，最终系统在HPatches上产生了最先进的同构估计结果。

SuperPoint overview. Taken from the original paper.

使用提示

以下是使用模型检测图像中兴趣点的快速示例：

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)

输出包含关键点坐标列表及其各自的分数和描述（一个256长的向量）。

你也可以向模型输入多张图像。由于SuperPoint的特性，为了输出动态数量的关键点，你需要使用mask属性来检索相应的信息：

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
image_2 = Image.open(requests.get(url_image_2, stream=True).raw)

images = [image_1, image_2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
image_sizes = [(image.height, image.width) for image in images]
outputs = processor.post_process_keypoint_detection(outputs, image_sizes)

for output in outputs:
    for keypoints, scores, descriptors in zip(output["keypoints"], output["scores"], output["descriptors"]):
        print(f"Keypoints: {keypoints}")
        print(f"Scores: {scores}")
        print(f"Descriptors: {descriptors}")

然后你可以在你选择的图像上打印关键点以可视化结果：

import matplotlib.pyplot as plt

plt.axis("off")
plt.imshow(image_1)
plt.scatter(
    outputs[0]["keypoints"][:, 0],
    outputs[0]["keypoints"][:, 1],
    c=outputs[0]["scores"] * 100,
    s=outputs[0]["scores"] * 50,
    alpha=0.8
)
plt.savefig(f"output_image.png")

image/png

该模型由stevenbucaille贡献。原始代码可以在这里找到。

资源

以下是官方Hugging Face和社区（由🌎表示）提供的资源列表，帮助您开始使用SuperPoint。如果您有兴趣提交资源以包含在此处，请随时打开一个Pull Request，我们将进行审核！理想情况下，资源应展示一些新内容，而不是重复现有资源。

一个展示使用SuperPoint进行推理和可视化的笔记本可以在这里找到。🌎

SuperPointConfig

类 transformers.SuperPointConfig

< source >

( encoder_hidden_sizes: typing.List[int] = [64, 64, 128, 128] decoder_hidden_size: int = 256 keypoint_decoder_dim: int = 65 descriptor_decoder_dim: int = 256 keypoint_threshold: float = 0.005 max_keypoints: int = -1 nms_radius: int = 4 border_removal_distance: int = 4 initializer_range = 0.02 **kwargs )

参数

encoder_hidden_sizes (List, 可选, 默认为 [64, 64, 128, 128]) — 编码器中每个卷积层的通道数。
decoder_hidden_size (int, optional, defaults to 256) — 解码器的隐藏大小。
keypoint_decoder_dim (int, 可选, 默认为 65) — 关键点解码器的输出维度。
descriptor_decoder_dim (int, optional, 默认为 256) — 描述符解码器的输出维度。
keypoint_threshold (float, optional, defaults to 0.005) — 用于提取关键点的阈值。
max_keypoints (int, 可选, 默认为 -1) — 要提取的关键点的最大数量。如果为 -1，将提取所有关键点。
nms_radius (int, optional, 默认为 4) — 非极大值抑制的半径。
border_removal_distance (int, optional, 默认为 4) — 从边界移除关键点的距离。
initializer_range (float, 可选, 默认为 0.02) — 用于初始化所有权重矩阵的截断正态初始化器的标准差。

这是用于存储SuperPointForKeypointDetection配置的配置类。它用于根据指定的参数实例化一个SuperPoint模型，定义模型架构。使用默认值实例化配置将产生与SuperPoint magic-leap-community/superpoint架构类似的配置。

配置对象继承自PretrainedConfig，可用于控制模型输出。阅读PretrainedConfig的文档以获取更多信息。

示例：

>>> from transformers import SuperPointConfig, SuperPointForKeypointDetection

>>> # Initializing a SuperPoint superpoint style configuration
>>> configuration = SuperPointConfig()
>>> # Initializing a model from the superpoint style configuration
>>> model = SuperPointForKeypointDetection(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

SuperPointImageProcessor

类 transformers.SuperPointImageProcessor

< source >

( do_resize: bool = True size: typing.Dict[str, int] = None do_rescale: bool = True rescale_factor: float = 0.00392156862745098 **kwargs )

参数

do_resize (bool, 可选, 默认为 True) — 控制是否将图像的（高度，宽度）尺寸调整为指定的 size。可以在 preprocess 方法中通过 do_resize 进行覆盖。
size (Dict[str, int] 可选, 默认为 {"height" -- 480, "width": 640}): 应用resize后输出图像的分辨率。仅在do_resize设置为True时有效。可以在preprocess方法中通过size覆盖此设置。
do_rescale (bool, 可选, 默认为 True) — 是否通过指定的比例 rescale_factor 来重新缩放图像。可以在 preprocess 方法中被 do_rescale 覆盖。
rescale_factor (int 或 float, 可选, 默认为 1/255) — 如果重新缩放图像，则使用的缩放因子。可以在 preprocess 方法中被 rescale_factor 覆盖。

构建一个SuperPoint图像处理器。

后处理关键点检测

< source >

( outputs: SuperPointKeypointDescriptionOutput target_sizes: typing.Union[transformers.utils.generic.TensorType, typing.List[typing.Tuple]] ) → List[Dict]

参数

输出 (SuperPointKeypointDescriptionOutput) — 模型的原始输出，包含相对 (x, y) 格式的关键点，以及分数和描述符。
target_sizes (torch.Tensor 或 List[Tuple[int, int]]) — 形状为 (batch_size, 2) 的张量或包含批次中每个图像目标大小的元组列表 (Tuple[int, int])。这必须是原始图像大小（在任何处理之前）。

List[Dict]

一个字典列表，每个字典包含根据模型预测的批次中图像的绝对格式的关键点、分数和描述符。

将SuperPointForKeypointDetection的原始输出转换为关键点、分数和描述符的列表，坐标相对于原始图像大小。

预处理

< source >

( images do_resize: bool = None size: typing.Dict[str, int] = None do_rescale: bool = None rescale_factor: float = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

参数

图像 (ImageInput) — 要预处理的图像。期望输入单个或批量的图像，像素值范围在0到255之间。如果传入的图像像素值在0到1之间，请设置 do_rescale=False.
do_resize (bool, optional, defaults to self.do_resize) — 是否调整图像大小.
size (Dict[str, int], 可选, 默认为 self.size) — 应用resize后输出图像的大小。如果size["shortest_edge"] >= 384，图像将被调整为(size["shortest_edge"], size["shortest_edge"])。否则，图像的较小边将匹配到int(size["shortest_edge"]/ crop_pct)，之后图像将被裁剪为 (size["shortest_edge"], size["shortest_edge"])。仅在do_resize设置为True时有效。
do_rescale (bool, optional, defaults to self.do_rescale) — 是否将图像值重新缩放到 [0 - 1] 之间。
rescale_factor (float, 可选, 默认为 self.rescale_factor) — 如果 do_rescale 设置为 True，则用于重新缩放图像的重新缩放因子。
return_tensors (str 或 TensorType, 可选) — 返回的张量类型。可以是以下之一：
- 未设置：返回一个 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回一个类型为 tf.Tensor 的批次。
- TensorType.PYTORCH 或 'pt'：返回一个类型为 torch.Tensor 的批次。
- TensorType.NUMPY 或 'np'：返回一个类型为 np.ndarray 的批次。
- TensorType.JAX 或 'jax'：返回一个类型为 jax.numpy.ndarray 的批次。
data_format (ChannelDimension 或 str, 可选, 默认为 ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
- 未设置：使用输入图像的通道维度格式。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 图像格式为 (height, width)。

预处理一张图像或一批图像。

调整大小

< source >

( image: ndarray size: typing.Dict[str, int] data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

参数

image (np.ndarray) — 要调整大小的图像。
size (Dict[str, int]) — 形式为 {"height": int, "width": int} 的字典，指定输出图像的大小。
data_format (ChannelDimension 或 str, 可选) — 输出图像的通道维度格式。如果未提供，将从输入图像中推断。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 图像格式为 (height, width)。
input_data_format (ChannelDimension 或 str, 可选) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 图像格式为 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 图像格式为 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 图像格式为 (height, width)。

调整图像大小。

预处理
post_process_keypoint_detection

关键点检测的SuperPoint

class transformers.SuperPointForKeypointDetection

< source >

( config: SuperPointConfig )

参数

config (SuperPointConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

SuperPoint模型输出关键点和描述符。该模型是PyTorch torch.nn.Module的子类。将其用作常规的PyTorch模块，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

SuperPoint模型。它由SuperPointEncoder、SuperPointInterestPointDecoder和SuperPointDescriptorDecoder组成。SuperPoint是由Daniel DeTone、Tomasz Malisiewicz和Andrew Rabinovich在SuperPoint: Self-Supervised Interest Point Detection and Description __中提出的。它是一个全卷积神经网络，用于从图像中提取关键点和描述符。它以自监督的方式进行训练，结合了光度损失和基于关键点单应性适应的损失。它由一个卷积编码器和两个解码器组成：一个用于关键点，一个用于描述符。

前进

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None )

参数

pixel_values (torch.FloatTensor of shape (batch_size, num_channels, height, width)) — 像素值。像素值可以使用SuperPointImageProcessor获取。详情请参见 SuperPointImageProcessor.call().
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, optional) — Whether or not to return a ModelOutput instead of a plain tuple.
示例：

SuperPointForKeypointDetection 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

前进

< > Update on GitHub

←SegGpt SwiftFormer→