Transformers 文档

自动转换器

Transformers

Autoformer

概述

Autoformer模型由Haixu Wu、Jiehui Xu、Jianmin Wang和Mingsheng Long在Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting中提出。

该模型将Transformer增强为深度分解架构，可以在预测过程中逐步分解趋势和季节性成分。

论文的摘要如下：

延长预测时间是实际应用中的关键需求，例如极端天气预警和长期能源消耗规划。本文研究了时间序列的长期预测问题。之前的基于Transformer的模型采用各种自注意力机制来发现长程依赖关系。然而，长期未来的复杂时间模式阻碍了模型找到可靠的依赖关系。此外，Transformer必须采用稀疏版本的点式自注意力以提高长序列的效率，导致信息利用瓶颈。超越Transformer，我们设计了Autoformer作为一种具有自相关机制的新型分解架构。我们打破了序列分解的预处理惯例，并将其革新为深度模型的基本内部块。这种设计使Autoformer具备了对复杂时间序列的渐进分解能力。此外，受随机过程理论的启发，我们基于序列周期性设计了自相关机制，该机制在子序列级别进行依赖关系发现和表示聚合。自相关机制在效率和准确性上都优于自注意力机制。在长期预测中，Autoformer在六个基准测试中实现了最先进的准确性，相对提高了38%，涵盖了五个实际应用：能源、交通、经济、天气和疾病。

该模型由elisim和kashif贡献。原始代码可以在这里找到。

资源

一份官方的Hugging Face和社区（由🌎表示）资源列表，帮助您入门。如果您有兴趣提交资源以包含在此处，请随时打开一个Pull Request，我们将进行审核！理想情况下，资源应展示一些新内容，而不是重复现有资源。

查看 HuggingFace 博客中的 Autoformer 博客文章：是的，Transformers 在时间序列预测中有效（+ Autoformer）

AutoformerConfig

类 transformers.AutoformerConfig

< source >

( prediction_length: typing.Optional[int] = None context_length: typing.Optional[int] = None distribution_output: str = 'student_t' loss: str = 'nll' input_size: int = 1 lags_sequence: typing.List[int] = [1, 2, 3, 4, 5, 6, 7] scaling: bool = True num_time_features: int = 0 num_dynamic_real_features: int = 0 num_static_categorical_features: int = 0 num_static_real_features: int = 0 cardinality: typing.Optional[typing.List[int]] = None embedding_dimension: typing.Optional[typing.List[int]] = None d_model: int = 64 encoder_attention_heads: int = 2 decoder_attention_heads: int = 2 encoder_layers: int = 2 decoder_layers: int = 2 encoder_ffn_dim: int = 32 decoder_ffn_dim: int = 32 activation_function: str = 'gelu' dropout: float = 0.1 encoder_layerdrop: float = 0.1 decoder_layerdrop: float = 0.1 attention_dropout: float = 0.1 activation_dropout: float = 0.1 num_parallel_samples: int = 100 init_std: float = 0.02 use_cache: bool = True is_encoder_decoder = True label_length: int = 10 moving_average: int = 25 autocorrelation_factor: int = 3 **kwargs )

参数

prediction_length (int) — 解码器的预测长度。换句话说，模型的预测范围。
context_length (int, 可选, 默认为 prediction_length) — 编码器的上下文长度。如果未设置，上下文长度将与 prediction_length相同.
distribution_output (string, 可选, 默认为 "student_t") — 模型的分布输出头。可以是“student_t”、“normal”或“negative_binomial”。
loss (string, 可选, 默认为 "nll") — 模型对应的distribution_output头的损失函数。对于参数分布，它是负对数似然（nll）——目前是唯一支持的一种。
input_size (int, 可选, 默认为 1) — 目标变量的大小，默认情况下对于单变量目标为1。在多变量目标的情况下会大于1。
lags_sequence (list[int], 可选, 默认为 [1, 2, 3, 4, 5, 6, 7]) — 输入时间序列的滞后作为协变量，通常由频率决定。默认值为 [1, 2, 3, 4, 5, 6, 7].
scaling (bool, optional defaults to True) — 是否缩放输入目标.
num_time_features (int, 可选, 默认为 0) — 输入时间序列中的时间特征数量。
num_dynamic_real_features (int, 可选, 默认为 0) — 动态实值特征的数量.
num_static_categorical_features (int, 可选, 默认为 0) — 静态分类特征的数量.
num_static_real_features (int, 可选, 默认为 0) — 静态实值特征的数量。
cardinality (list[int], optional) — 每个静态分类特征的基数（不同值的数量）。应该是一个整数列表，长度与num_static_categorical_features相同。如果num_static_categorical_features大于0，则不能为None。
embedding_dimension (list[int], 可选) — 每个静态分类特征的嵌入维度。应该是一个整数列表，长度与num_static_categorical_features相同。如果 num_static_categorical_features大于0，则不能为None.
d_model (int, optional, 默认为 64) — Transformer 层的维度.
encoder_layers (int, optional, defaults to 2) — 编码器层数.
decoder_layers (int, optional, defaults to 2) — 解码器层数.
encoder_attention_heads (int, optional, defaults to 2) — Transformer编码器中每个注意力层的注意力头数量。
decoder_attention_heads (int, optional, 默认为 2) — Transformer解码器中每个注意力层的注意力头数。
encoder_ffn_dim (int, optional, defaults to 32) — 编码器中“中间”（通常称为前馈）层的维度。
decoder_ffn_dim (int, optional, defaults to 32) — 解码器中“中间”（通常称为前馈）层的维度。
activation_function (str 或 function, 可选, 默认为 "gelu") — 编码器和解码器中的非线性激活函数（函数或字符串）。如果是字符串，支持 "gelu" 和 "relu".
dropout (float, optional, defaults to 0.1) — 编码器和解码器中所有全连接层的dropout概率。
encoder_layerdrop (float, optional, defaults to 0.1) — 每个编码器层的注意力和全连接层的丢弃概率。
decoder_layerdrop (float, optional, defaults to 0.1) — 每个解码器层的注意力和全连接层的丢弃概率。
attention_dropout (float, optional, 默认为 0.1) — 注意力概率的 dropout 概率。
activation_dropout (float, optional, 默认为 0.1) — 在前馈网络的两层之间使用的 dropout 概率。
num_parallel_samples (int, optional, defaults to 100) — 在推理的每个时间步中并行生成的样本数量。
init_std (float, optional, 默认为 0.02) — 截断正态权重初始化分布的标准差。
use_cache (bool, 可选, 默认为 True) — 是否使用过去的键/值注意力（如果适用于模型）以加速解码。
label_length (int, optional, 默认为 10) — Autoformer 解码器的起始标记长度，用于直接多步预测（即非自回归生成）。
moving_average (int, 可选, 默认为 25) — 移动平均的窗口大小。实际上，它是分解层中 AvgPool1d 的核大小。
autocorrelation_factor (int, optional, defaults to 3) — “注意力”（即自相关机制）因子，用于找到前k个自相关延迟。论文中建议将其设置为1到5之间的数字。

这是用于存储AutoformerModel配置的配置类。它用于根据指定的参数实例化一个Autoformer模型，定义模型架构。使用默认值实例化配置将产生类似于Autoformer huggingface/autoformer-tourism-monthly架构的配置。

配置对象继承自PretrainedConfig，可用于控制模型输出。请阅读PretrainedConfig的文档以获取更多信息。

>>> from transformers import AutoformerConfig, AutoformerModel

>>> # Initializing a default Autoformer configuration
>>> configuration = AutoformerConfig()

>>> # Randomly initializing a model (with random weights) from the configuration
>>> model = AutoformerModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

AutoformerModel

类 transformers.AutoformerModel

< source >

( config: AutoformerConfig )

参数

config (AutoformerConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

裸的Autoformer模型输出原始隐藏状态，没有任何特定的头部。此模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( past_values: Tensor past_time_features: Tensor past_observed_mask: Tensor static_categorical_features: typing.Optional[torch.Tensor] = None static_real_features: typing.Optional[torch.Tensor] = None future_values: typing.Optional[torch.Tensor] = None future_time_features: typing.Optional[torch.Tensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None output_hidden_states: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None use_cache: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.autoformer.modeling_autoformer.AutoformerModelOutput 或 tuple(torch.FloatTensor)

参数

past_values (torch.FloatTensor of shape (batch_size, sequence_length)) — Past values of the time series, that serve as context in order to predict the future. These values may contain lags, i.e. additional values from the past which are added in order to serve as “extra context”. The past_values is what the Transformer encoder gets as input (with optional additional features, such as static_categorical_features, static_real_features, past_time_features).
这里的序列长度等于 context_length + max(config.lags_sequence)。

缺失值需要用零替换。
past_time_features (torch.FloatTensor of shape (batch_size, sequence_length, num_features), optional) — Optional time features, which the model internally will add to past_values. These could be things like “month of year”, “day of the month”, etc. encoded as vectors (for instance as Fourier features). These could also be so-called “age” features, which basically help the model know “at which point in life” a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step.
这些特征作为输入的“位置编码”。因此，与像BERT这样的模型不同，在BERT中，位置编码是从头开始作为模型参数学习的，时间序列Transformer需要提供额外的时间特征。

Autoformer 仅学习 static_categorical_features 的额外嵌入。
past_observed_mask (torch.BoolTensor of shape (batch_size, sequence_length), optional) — 布尔掩码，用于指示哪些past_values被观察到，哪些缺失。掩码值在 [0, 1]中选择：
- 1 表示被观察到的值，
- 0 表示缺失的值（即被零替换的NaNs）。
static_categorical_features (torch.LongTensor of shape (batch_size, number of static categorical features), optional) — Optional static categorical features for which the model will learn an embedding, which it will add to the values of the time series.
静态分类特征是指在所有时间步长中具有相同值的特征（随时间保持不变）。

静态分类特征的一个典型例子是时间序列ID。
static_real_features (torch.FloatTensor of shape (batch_size, number of static real features), optional) — Optional static real features which the model will add to the values of the time series.
静态真实特征是指对于所有时间步长具有相同值的特征（随时间保持不变）。

静态真实特征的一个典型例子是促销信息。
future_values (torch.FloatTensor of shape (batch_size, prediction_length)) — Future values of the time series, that serve as labels for the model. The future_values is what the Transformer needs to learn to output, given the past_values.
详情请参见演示笔记本和代码片段。

缺失值需要用零替换。
future_time_features (torch.FloatTensor of shape (batch_size, prediction_length, num_features), optional) — Optional time features, which the model internally will add to future_values. These could be things like “month of year”, “day of the month”, etc. encoded as vectors (for instance as Fourier features). These could also be so-called “age” features, which basically help the model know “at which point in life” a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step.
这些特征作为输入的“位置编码”。因此，与BERT等模型不同，在BERT中，位置编码是作为模型参数从头开始学习的，而时间序列Transformer需要提供额外的特征。

Autoformer 仅学习 static_categorical_features 的额外嵌入。
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on certain token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — 用于避免对某些标记索引执行注意力的掩码。默认情况下，将使用因果掩码，以确保模型只能查看先前的输入以预测未来。
head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
decoder_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的头部。在 [0, 1] 中选择的掩码值：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选) — 元组由 last_hidden_state, hidden_states (可选) 和 attentions (可选) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size) (可选) 是编码器最后一层的输出隐藏状态序列。用于解码器的交叉注意力机制中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.models.autoformer.modeling_autoformer.AutoformerModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.models.autoformer.modeling_autoformer.AutoformerModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含根据配置（AutoformerConfig）和输入的各种元素。

last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
trend (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)) — 每个时间序列的趋势张量。
past_key_values (tuple(tuple(torch.FloatTensor)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — torch.FloatTensor 元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器每层输出的隐藏状态加上可选的初始嵌入输出。
decoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — torch.FloatTensor 元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器每层输出的隐藏状态加上可选的初始嵌入输出。
encoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
loc (torch.FloatTensor 形状为 (batch_size,) 或 (batch_size, input_size), 可选) — 每个时间序列上下文窗口的偏移值，用于使模型输入具有相同的幅度，然后用于偏移回原始幅度。
scale (torch.FloatTensor 形状为 (batch_size,) 或 (batch_size, input_size), 可选) — 每个时间序列上下文窗口的缩放值，用于使模型输入具有相同的幅度，然后用于缩放回原始幅度。
static_features: (torch.FloatTensor 形状为 (batch_size, feature size), 可选) — 每个时间序列的静态特征，在推理时复制到协变量中。

AutoformerModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import AutoformerModel

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> model = AutoformerModel.from_pretrained("huggingface/autoformer-tourism-monthly")

>>> # during training, one provides both past and future values
>>> # as well as possible additional features
>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )

>>> last_hidden_state = outputs.last_hidden_state

AutoformerForPrediction

类 transformers.AutoformerForPrediction

< source >

( config: AutoformerConfig )

参数

config (AutoformerConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

用于时间序列预测的Autoformer模型，顶部带有分布头。该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( past_values: Tensor past_time_features: Tensor past_observed_mask: Tensor static_categorical_features: typing.Optional[torch.Tensor] = None static_real_features: typing.Optional[torch.Tensor] = None future_values: typing.Optional[torch.Tensor] = None future_time_features: typing.Optional[torch.Tensor] = None future_observed_mask: typing.Optional[torch.Tensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None output_hidden_states: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None use_cache: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqTSPredictionOutput 或 tuple(torch.FloatTensor)

参数

past_values (torch.FloatTensor of shape (batch_size, sequence_length)) — Past values of the time series, that serve as context in order to predict the future. These values may contain lags, i.e. additional values from the past which are added in order to serve as “extra context”. The past_values is what the Transformer encoder gets as input (with optional additional features, such as static_categorical_features, static_real_features, past_time_features).
这里的序列长度等于 context_length + max(config.lags_sequence)。

缺失值需要用零替换。
past_time_features (torch.FloatTensor of shape (batch_size, sequence_length, num_features), optional) — Optional time features, which the model internally will add to past_values. These could be things like “month of year”, “day of the month”, etc. encoded as vectors (for instance as Fourier features). These could also be so-called “age” features, which basically help the model know “at which point in life” a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step.
这些特征作为输入的“位置编码”。因此，与像BERT这样的模型不同，在BERT中，位置编码是从头开始作为模型参数学习的，时间序列Transformer需要提供额外的时间特征。

Autoformer 仅学习 static_categorical_features 的额外嵌入。
past_observed_mask (torch.BoolTensor of shape (batch_size, sequence_length), optional) — 布尔掩码，用于指示哪些past_values被观察到，哪些缺失。掩码值在 [0, 1]中选择：
- 1 表示被观察到的值，
- 0 表示缺失的值（即被零替换的NaNs）。
static_categorical_features (torch.LongTensor of shape (batch_size, number of static categorical features), optional) — Optional static categorical features for which the model will learn an embedding, which it will add to the values of the time series.
静态分类特征是指在所有时间步长中具有相同值的特征（随时间保持不变）。

静态分类特征的一个典型例子是时间序列ID。
static_real_features (torch.FloatTensor of shape (batch_size, number of static real features), optional) — Optional static real features which the model will add to the values of the time series.
静态真实特征是指对于所有时间步长具有相同值的特征（随时间保持不变）。

静态真实特征的一个典型例子是促销信息。
future_values (torch.FloatTensor of shape (batch_size, prediction_length)) — Future values of the time series, that serve as labels for the model. The future_values is what the Transformer needs to learn to output, given the past_values.
详情请参见演示笔记本和代码片段。

缺失值需要用零替换。
future_time_features (torch.FloatTensor of shape (batch_size, prediction_length, num_features), optional) — Optional time features, which the model internally will add to future_values. These could be things like “month of year”, “day of the month”, etc. encoded as vectors (for instance as Fourier features). These could also be so-called “age” features, which basically help the model know “at which point in life” a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step.
这些特征作为输入的“位置编码”。因此，与BERT等模型不同，在BERT中，位置编码是作为模型参数从头开始学习的，而时间序列Transformer需要提供额外的特征。

Autoformer 仅学习 static_categorical_features 的额外嵌入。
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on certain token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — 用于避免对某些标记索引执行注意力的掩码。默认情况下，将使用因果掩码，以确保模型只能查看先前的输入以预测未来。
head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
decoder_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选) — 元组由 last_hidden_state, hidden_states (可选) 和 attentions (可选) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size) (可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力机制中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_outputs.Seq2SeqTSPredictionOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.Seq2SeqTSPredictionOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，取决于配置（AutoformerConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 future_values 时返回）— 分布损失。
params (torch.FloatTensor 形状为 (batch_size, num_samples, num_params)) — 所选分布的参数。
past_key_values (tuple(tuple(torch.FloatTensor))，可选，当传递 use_cache=True 或当 config.use_cache=True 时返回）— 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)，可选）— 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
loc (torch.FloatTensor 形状为 (batch_size,) 或 (batch_size, input_size)，可选）— 每个时间序列上下文窗口的偏移值，用于使模型输入具有相同的幅度，然后用于移回原始幅度。
scale (torch.FloatTensor 形状为 (batch_size,) 或 (batch_size, input_size)，可选）— 每个时间序列上下文窗口的缩放值，用于使模型输入具有相同的幅度，然后用于重新缩放回原始幅度。
static_features (torch.FloatTensor 形状为 (batch_size, feature size)，可选）— 每个时间序列的静态特征，在推理时复制到协变量中。

AutoformerForPrediction 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import AutoformerForPrediction

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly")

>>> # during training, one provides both past and future values
>>> # as well as possible additional features
>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )

>>> loss = outputs.loss
>>> loss.backward()

>>> # during inference, one only provides past values
>>> # as well as possible additional features
>>> # the model autoregressively generates future values
>>> outputs = model.generate(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     future_time_features=batch["future_time_features"],
... )

>>> mean_prediction = outputs.sequences.mean(dim=1)

AutoformerForPrediction 也可以使用 static_real_features。为此，请根据数据集中此类特征的数量在 AutoformerConfig 中设置 num_static_real_features（在 tourism_monthly 数据集的情况下）

等于1时，初始化模型并如下所示调用：

>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import AutoformerConfig, AutoformerForPrediction

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> # check number of static real features
>>> num_static_real_features = batch["static_real_features"].shape[-1]

>>> # load configuration of pretrained model and override num_static_real_features
>>> configuration = AutoformerConfig.from_pretrained(
...     "huggingface/autoformer-tourism-monthly",
...     num_static_real_features=num_static_real_features,
... )
>>> # we also need to update feature_size as it is not recalculated
>>> configuration.feature_size += num_static_real_features

>>> model = AutoformerForPrediction(configuration)

>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     static_real_features=batch["static_real_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )

< > Update on GitHub

←Trajectory Transformer Informer→