模型输出
所有模型的输出都是ModelOutput子类的实例。这些是包含模型返回的所有信息的数据结构,但也可以作为元组或字典使用。
让我们看看这在示例中是什么样子的:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
outputs
对象是一个 SequenceClassifierOutput,正如我们在下面该类的文档中所看到的,这意味着它有一个可选的 loss
,一个 logits
,一个可选的 hidden_states
和一个可选的 attentions
属性。这里我们有 loss
,因为我们传递了 labels
,但我们没有 hidden_states
和 attentions
,因为我们没有传递 output_hidden_states=True
或 output_attentions=True
。
当传递output_hidden_states=True
时,您可能期望outputs.hidden_states[-1]
与outputs.last_hidden_state
完全匹配。
然而,情况并非总是如此。一些模型在返回最后一个隐藏状态时会应用归一化或后续处理。
您可以像通常那样访问每个属性,如果模型没有返回该属性,您将得到None
。例如,这里的outputs.loss
是模型计算的损失,而outputs.attentions
是None
。
当我们将outputs
对象视为元组时,它只考虑那些没有None
值的属性。
例如,这里有两个元素,loss
然后是logits
,所以
outputs[:2]
将返回元组 (outputs.loss, outputs.logits)
例如。
当我们将outputs
对象视为字典时,它只考虑那些没有None
值的属性。例如,这里有两个键,分别是loss
和logits
。
我们在这里记录了由多种模型类型使用的通用模型输出。特定输出类型在其相应的模型页面上有详细说明。
ModelOutput
所有模型输出的基类作为数据类。具有一个__getitem__
,允许通过整数或切片(如元组)或字符串(如字典)进行索引,这将忽略None
属性。否则行为类似于常规的Python字典。
你不能直接解包一个ModelOutput
。使用to_tuple()方法将其转换为元组。
将自身转换为包含所有不为None
的属性/键的元组。
BaseModelOutput
类 transformers.modeling_outputs.BaseModelOutput
< source >( last_hidden_state: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能包含隐藏状态和注意力。
BaseModelOutputWithPooling
类 transformers.modeling_outputs.BaseModelOutputWithPooling
< source >( last_hidden_state: FloatTensor = None pooler_output: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) — 序列的第一个标记(分类标记)在经过用于辅助预训练任务的层进一步处理后的最后一层隐藏状态。例如,对于BERT系列模型,这返回经过线性层和tanh激活函数处理后的分类标记。线性层的权重是在预训练期间通过下一个句子预测(分类)目标进行训练的。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,还包含最后隐藏状态的池化。
BaseModelOutputWithCrossAttentions
类 transformers.modeling_outputs.BaseModelOutputWithCrossAttentions
< source >( last_hidden_state: FloatTensor = 无 hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
模型输出的基类,可能包含隐藏状态和注意力。
BaseModelOutputWithPoolingAndCrossAttentions
类 transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
< source >( last_hidden_state: FloatTensor = None pooler_output: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
torch.FloatTensor
of shape(batch_size, hidden_size)
) — 序列的第一个标记(分类标记)在经过用于辅助预训练任务的层进一步处理后的最后一层隐藏状态。例如,对于BERT系列模型,这返回经过线性层和tanh激活函数处理后的分类标记。线性层的权重是在预训练期间通过下一个句子预测(分类)目标进行训练的。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块中的键和值,以及如果
config.is_encoder_decoder=True
在交叉注意力块中),这些状态可以用来(参见past_key_values
输入)加速顺序解码。
模型输出的基类,还包含最后隐藏状态的池化。
BaseModelOutputWithPast
类 transformers.modeling_outputs.BaseModelOutputWithPast
< source >( last_hidden_state: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块中的键和值,以及如果
config.is_encoder_decoder=True
在交叉注意力块中),这些状态可以用来(参见past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能还包含过去的键/值(以加速顺序解码)。
BaseModelOutputWithPastAndCrossAttentions
类 transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions
< source >( last_hidden_state: FloatTensor = 无 past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = 无 hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = 无 )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块中的键和值,以及如果
config.is_encoder_decoder=True
在交叉注意力块中),这些状态可以用来(参见past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
模型输出的基类,可能还包含过去的键/值(以加速顺序解码)。
Seq2SeqModelOutput
类 transformers.modeling_outputs.Seq2SeqModelOutput
< source >( last_hidden_state: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上可选的初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型编码器输出的基类,还包含:可以加速顺序解码的预计算隐藏状态。
CausalLMOutput
类 transformers.modeling_outputs.CausalLMOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 语言建模损失(用于下一个标记的预测)。 - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
因果语言模型(或自回归)输出的基类。
CausalLMOutputWithCrossAttentions
类 transformers.modeling_outputs.CausalLMOutputWithCrossAttentions
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 语言建模损失(用于下一个标记的预测)。 - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.交叉注意力权重在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftorch.FloatTensor
tuples of lengthconfig.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant ifconfig.is_decoder = True
.包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。
因果语言模型(或自回归)输出的基类。
CausalLMOutputWithPast
类 transformers.modeling_outputs.CausalLMOutputWithPast
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) — 语言建模损失(用于下一个标记的预测)。 - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
)包含预先计算的隐藏状态(自注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
因果语言模型(或自回归)输出的基类。
MaskedLMOutput
类 transformers.modeling_outputs.MaskedLMOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 掩码语言建模(MLM)损失。 - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于掩码语言模型输出的基类。
Seq2SeqLMOutput
类 transformers.modeling_outputs.Seq2SeqLMOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 语言建模损失. - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列语言模型输出的基类。
NextSentencePredictorOutput
类 transformers.modeling_outputs.NextSentencePredictorOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供next_sentence_label
时返回) — 下一个序列预测(分类)损失。 - logits (
torch.FloatTensor
of shape(batch_size, 2)
) — 下一个序列预测(分类)头的预测分数(在SoftMax之前的True/False继续的分数)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于预测两个句子是否连续的模型输出的基类。
SequenceClassifierOutput
类 transformers.modeling_outputs.SequenceClassifierOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类(如果 config.num_labels==1 则为回归)损失。 - logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) — 分类(如果config.num_labels==1则为回归)分数(在SoftMax之前)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
句子分类模型输出的基类。
Seq2SeqSequenceClassifierOutput
类 transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供label
时返回) — 分类(或回归,如果 config.num_labels==1)损失. - logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) — 分类(如果 config.num_labels==1 则为回归)得分(在 SoftMax 之前)。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列句子分类模型输出的基类。
MultipleChoiceModelOutput
类 transformers.modeling_outputs.MultipleChoiceModelOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape (1,), optional, 当提供labels
时返回) — 分类损失. - logits (
torch.FloatTensor
of shape(batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).分类分数(在SoftMax之前)。
- hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
多选模型输出的基类。
TokenClassifierOutput
类 transformers.modeling_outputs.TokenClassifierOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类损失. - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.num_labels)
) — 分类分数(在SoftMax之前)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于标记分类模型输出的基类。
QuestionAnsweringModelOutput
类 transformers.modeling_outputs.QuestionAnsweringModelOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None start_logits: FloatTensor = None end_logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。 - start_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) — 跨度结束分数(在SoftMax之前)。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
问答模型输出的基类。
Seq2Seq问答模型输出
类 transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None start_logits: FloatTensor = None end_logits: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。 - start_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
torch.FloatTensor
of shape(batch_size, sequence_length)
) — 跨度结束分数(在SoftMax之前)。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列问答模型输出的基类。
Seq2SeqSpectrogramOutput
类 transformers.modeling_outputs.Seq2SeqSpectrogramOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None spectrogram: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 频谱图生成损失. - 频谱图 (
torch.FloatTensor
形状为(batch_size, sequence_length, num_bins)
) — 预测的频谱图. - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列频谱图输出的基类。
SemanticSegmenterOutput
类 transformers.modeling_outputs.SemanticSegmenterOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类(或回归,如果 config.num_labels==1)损失。 - logits (
torch.FloatTensor
of shape(batch_size, config.num_labels, logits_height, logits_width)
) — Classification scores for each pixel.返回的logits不一定与作为输入传递的
pixel_values
大小相同。这是为了避免在用户需要将logits调整回原始图像大小作为后处理时进行两次插值并损失一些质量。您应始终检查logits的形状并根据需要调整大小。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, patch_size, hidden_size)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
语义分割模型输出的基类。
ImageClassifierOutput
类 transformers.modeling_outputs.ImageClassifierOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类(如果 config.num_labels==1 则为回归)损失。 - logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) — 分类(如果 config.num_labels==1 则为回归)分数(在 SoftMax 之前)。 - hidden_states (
tuple(torch.FloatTensor)
, 可选, 当传递output_hidden_states=True
或config.output_hidden_states=True
时返回) —torch.FloatTensor
的元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 一个用于每个阶段的输出)形状为(batch_size, sequence_length, hidden_size)
。模型在每个阶段输出的隐藏状态(也称为特征图)。 - attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
图像分类模型输出的基类。
ImageClassifierOutputWithNoAttention
类 transformers.modeling_outputs.ImageClassifierOutputWithNoAttention
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类(或回归,如果config.num_labels==1)损失. - logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) — 分类(如果config.num_labels==1则为回归)分数(在SoftMax之前)。 - hidden_states (
tuple(torch.FloatTensor)
, 可选, 当传递output_hidden_states=True
或当config.output_hidden_states=True
时返回) —torch.FloatTensor
的元组(一个用于嵌入层的输出,如果模型有嵌入层,+ 一个用于每个阶段的输出),形状为(batch_size, num_channels, height, width)
。模型在每个阶段输出的隐藏状态(也称为特征图)。
图像分类模型输出的基类。
深度估计器输出
class transformers.modeling_outputs.DepthEstimatorOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None predicted_depth: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供labels
时返回) — 分类(或回归,如果 config.num_labels==1)损失。 - predicted_depth (
torch.FloatTensor
of shape(batch_size, height, width)
) — 每个像素的预测深度。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, num_channels, height, width)
.模型在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
深度估计模型输出的基类。
Wav2Vec2BaseModelOutput
类 transformers.modeling_outputs.Wav2Vec2BaseModelOutput
< source >( last_hidden_state: FloatTensor = None extract_features: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - extract_features (
torch.FloatTensor
形状为(batch_size, sequence_length, conv_dim[-1])
) — 模型最后一个卷积层提取的特征向量序列。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
使用Wav2Vec2损失目标训练的模型的基础类。
XVectorOutput
类 transformers.modeling_outputs.XVectorOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None logits: FloatTensor = None embeddings: FloatTensor = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )
参数
- loss (
torch.FloatTensor
形状为(1,)
, 可选, 当提供labels
时返回) — 分类损失. - logits (
torch.FloatTensor
of shape(batch_size, config.xvector_output_dim)
) — 在AMSoftmax之前的分类隐藏状态。 - embeddings (
torch.FloatTensor
of shape(batch_size, config.xvector_output_dim)
) — 用于基于向量相似性检索的话语嵌入。 - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
Wav2Vec2ForXVector的输出类型。
Seq2SeqTSModelOutput
类 transformers.modeling_outputs.Seq2SeqTSModelOutput
< source >( last_hidden_state: FloatTensor = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None loc: typing.Optional[torch.FloatTensor] = None scale: typing.Optional[torch.FloatTensor] = None static_features: typing.Optional[torch.FloatTensor] = None )
参数
- last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上可选的初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态,加上可选的初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- loc (
torch.FloatTensor
形状为(batch_size,)
或(batch_size, input_size)
, 可选) — 每个时间序列上下文窗口的偏移值,用于使模型输入具有相同的幅度,然后用于恢复到原始幅度。 - scale (
torch.FloatTensor
形状为(batch_size,)
或(batch_size, input_size)
, 可选) — 每个时间序列上下文窗口的缩放值,用于给模型提供相同幅度的输入,然后用于重新缩放回原始幅度。 - static_features (
torch.FloatTensor
of shape(batch_size, feature size)
, optional) — 每个时间序列的静态特征,在推理时被复制到协变量中。
时间序列模型编码器输出的基类,还包含可以加速顺序解码的预计算隐藏状态。
Seq2SeqTSPredictionOutput
类 transformers.modeling_outputs.Seq2SeqTSPredictionOutput
< source >( loss: typing.Optional[torch.FloatTensor] = None params: typing.Optional[typing.Tuple[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None decoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None decoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None cross_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_last_hidden_state: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None encoder_attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None loc: typing.Optional[torch.FloatTensor] = None scale: typing.Optional[torch.FloatTensor] = None static_features: typing.Optional[torch.FloatTensor] = None )
参数
- loss (
torch.FloatTensor
of shape(1,)
, optional, 当提供future_values
时返回) — 分布损失. - params (
torch.FloatTensor
of shape(batch_size, num_samples, num_params)
) — 所选分布的参数。 - past_key_values (
tuple(tuple(torch.FloatTensor))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(torch.FloatTensor)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- loc (
torch.FloatTensor
形状为(batch_size,)
或(batch_size, input_size)
, 可选) — 每个时间序列上下文窗口的偏移值,用于使模型输入具有相同的幅度,然后用于恢复到原始幅度。 - scale (
torch.FloatTensor
形状为(batch_size,)
或(batch_size, input_size)
, 可选) — 每个时间序列上下文窗口的缩放值,用于给模型提供相同幅度的输入,然后用于重新缩放回原始幅度。 - static_features (
torch.FloatTensor
of shape(batch_size, feature size)
, optional) — 每个时间序列的静态特征,在推理时被复制到协变量中。
时间序列模型解码器输出的基类,这些输出还包含损失以及所选分布的参数。
SampleTSPredictionOutput
类 transformers.modeling_outputs.SampleTSPredictionOutput
< source >( sequences: FloatTensor = 无 )
时间序列模型预测输出的基类,包含从所选分布中采样的值。
TFBaseModelOutput
类 transformers.modeling_tf_outputs.TFBaseModelOutput
< source >( last_hidden_state: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - hidden_states (
tuple(tf.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能包含隐藏状态和注意力。
TFBaseModelOutputWithPooling
类 transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling
< source >( last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
tf.Tensor
of shape(batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.这个输出通常不是输入语义内容的好摘要,通常更好的做法是对整个输入序列的隐藏状态序列进行平均或池化。
- hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,还包含最后隐藏状态的池化。
TFBaseModelOutputWithPoolingAndCrossAttentions
类 transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions
< source >( last_hidden_state: tf.Tensor = None pooler_output: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
tf.Tensor
of shape(batch_size, hidden_size)
) — Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.这个输出通常不是输入语义内容的好摘要,通常更好的做法是对整个输入序列的隐藏状态序列进行平均或池化。
- past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
模型输出的基类,还包含最后隐藏状态的池化。
TFBaseModelOutputWithPast
类 transformers.modeling_tf_outputs.TFBaseModelOutputWithPast
< source >( last_hidden_state: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能还包含过去的键/值(以加速顺序解码)。
TFBaseModelOutputWithPastAndCrossAttentions
类 transformers.modeling_tf_outputs.TFBaseModelOutputWithPastAndCrossAttentions
< source >( last_hidden_state: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(tf.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
模型输出的基类,可能还包含过去的键/值(以加速顺序解码)。
TFSeq2SeqModelOutput
类 transformers.modeling_tf_outputs.TFSeq2SeqModelOutput
< source >( last_hidden_state: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
参数
- last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含解码器的预计算隐藏状态(注意力块中的键和值),这些状态可以用于(参见
past_key_values
输入)以加速顺序解码。 - decoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型编码器输出的基类,还包含:可以加速顺序解码的预计算隐藏状态。
TFCausalLMOutput
类 transformers.modeling_tf_outputs.TFCausalLMOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是非掩码标签的数量, 当提供labels
时返回) — 语言建模损失(用于下一个标记的预测)。 - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
因果语言模型(或自回归)输出的基类。
TFCausalLMOutputWithCrossAttentions
类 transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是非掩码标签的数量, 当提供labels
时返回) — 语言建模损失(用于下一个标记的预测)。 - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。
因果语言模型(或自回归)输出的基类。
TFCausalLMOutputWithPast
类 transformers.modeling_tf_outputs.TFCausalLMOutputWithPast
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是非掩码标签的数量, 当提供labels
时返回) — 语言建模损失(用于下一个标记的预测)。 - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
因果语言模型(或自回归)输出的基类。
TFMaskedLMOutput
类 transformers.modeling_tf_outputs.TFMaskedLMOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是非掩码标签的数量, 当提供labels
时返回) — 掩码语言建模 (MLM) 损失. - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于掩码语言模型输出的基类。
TFSeq2SeqLMOutput
类 transformers.modeling_tf_outputs.TFSeq2SeqLMOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是非掩码标签的数量, 当提供labels
时返回) — 语言建模损失. - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含解码器的预计算隐藏状态(注意力块中的键和值),这些状态可以用于(参见
past_key_values
输入)以加速顺序解码。 - decoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列语言模型输出的基类。
TFNextSentencePredictorOutput
类 transformers.modeling_tf_outputs.TFNextSentencePredictorOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
of shape(n,)
, optional, where n is the number of non-masked labels, returned whennext_sentence_label
is provided) — 下一句预测损失。 - logits (
tf.Tensor
of shape(batch_size, 2)
) — 下一个序列预测(分类)头的预测分数(在SoftMax之前的True/False继续的分数)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于预测两个句子是否连续的模型输出的基类。
TFSequenceClassifierOutput
类 transformers.modeling_tf_outputs.TFSequenceClassifierOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(batch_size, )
, 可选, 当提供labels
时返回) — 分类(或回归,如果 config.num_labels==1)损失. - logits (
tf.Tensor
形状为(batch_size, config.num_labels)
) — 分类(如果 config.num_labels==1 则为回归)分数(在 SoftMax 之前)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
句子分类模型输出的基类。
TFSeq2SeqSequenceClassifierOutput
类 transformers.modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None cross_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(1,)
, 可选, 当提供label
时返回) — 分类(如果 config.num_labels==1 则为回归)损失. - logits (
tf.Tensor
of shape(batch_size, config.num_labels)
) — 分类(如果 config.num_labels==1 则为回归)得分(在 SoftMax 之前)。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含解码器的预计算隐藏状态(注意力块中的键和值),这些状态可以用于(参见
past_key_values
输入)以加速顺序解码。 - decoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(tf.Tensor)
, 可选, 当传递output_attentions=True
或config.output_attentions=True
时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)
的tf.Tensor
元组(每层一个) - encoder_last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列句子分类模型输出的基类。
TFMultipleChoiceModelOutput
类 transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为 (batch_size, ), 可选, 当提供labels
时返回) — 分类损失. - logits (
tf.Tensor
of shape(batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).分类分数(在SoftMax之前)。
- hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
多选模型输出的基类。
TFTokenClassifierOutput
类 transformers.modeling_tf_outputs.TFTokenClassifierOutput
< source >( loss: tf.Tensor | None = None logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(n,)
, 可选, 其中 n 是未屏蔽标签的数量, 当提供labels
时返回) — 分类损失. - logits (
tf.Tensor
of shape(batch_size, sequence_length, config.num_labels)
) — 分类分数(在SoftMax之前)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于标记分类模型输出的基类。
TFQuestionAnsweringModelOutput
类 transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput
< source >( loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None hidden_states: Tuple[tf.Tensor] | None = None attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
形状为(batch_size, )
, 可选, 当提供start_positions
和end_positions
时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。 - start_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) — 跨度结束分数(在SoftMax之前)。 - hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
问答模型输出的基类。
TFSeq2SeqQuestionAnsweringModelOutput
类 transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
< source >( loss: tf.Tensor | None = None start_logits: tf.Tensor = None end_logits: tf.Tensor = None past_key_values: List[tf.Tensor] | None = None decoder_hidden_states: Tuple[tf.Tensor] | None = None decoder_attentions: Tuple[tf.Tensor] | None = None encoder_last_hidden_state: tf.Tensor | None = None encoder_hidden_states: Tuple[tf.Tensor] | None = None encoder_attentions: Tuple[tf.Tensor] | None = None )
参数
- loss (
tf.Tensor
of shape(1,)
, optional, 当提供labels
时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。 - start_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
tf.Tensor
of shape(batch_size, sequence_length)
) — 跨度结束分数(在SoftMax之前)。 - past_key_values (
List[tf.Tensor]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — List oftf.Tensor
of lengthconfig.n_layers
, with each tensor of shape(2, batch_size, num_heads, sequence_length, embed_size_per_head)
).包含解码器的预计算隐藏状态(注意力块中的键和值),这些状态可以用于(参见
past_key_values
输入)以加速顺序解码。 - decoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- encoder_last_hidden_state (
tf.Tensor
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(tf.Tensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftf.Tensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(tf.Tensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftf.Tensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列问答模型输出的基类。
FlaxBaseModelOutput
类 transformers.modeling_flax_outputs.FlaxBaseModelOutput
< source >( last_hidden_state: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能包含隐藏状态和注意力。
“返回一个新对象,用新值替换指定的字段。
FlaxBaseModelOutputWithPast
类 transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPast
< source >( last_hidden_state: 数组 = 无 past_key_values: typing.Optional[typing.Dict[str, jax.Array]] = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - past_key_values (
Dict[str, jnp.ndarray]
) — 预计算的隐藏状态字典(注意力块中的键和值),可用于快速自回归解码。预计算的键和值隐藏状态的形状为 [batch_size, max_length]. - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,可能包含隐藏状态和注意力。
“返回一个新对象,用新值替换指定的字段。
FlaxBaseModelOutputWithPooling
类 transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling
< source >( last_hidden_state: 数组 = 无 pooler_output: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
) — 模型最后一层输出的隐藏状态序列。 - pooler_output (
jnp.ndarray
of shape(batch_size, hidden_size)
) — 序列的第一个标记(分类标记)的最后一层隐藏状态,经过线性层和Tanh激活函数的进一步处理。线性层的权重是在预训练期间通过下一个句子预测(分类)目标进行训练的。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型输出的基类,还包含最后隐藏状态的池化。
“返回一个新对象,用新值替换指定的字段。
FlaxBaseModelOutputWithPastAndCrossAttentions
类 transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions
< source >( last_hidden_state: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 hidden_states: 可选[元组[jax.数组]] = 无 attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 )
参数
- last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(jnp.ndarray)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and optionally ifconfig.is_encoder_decoder=True
2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块中的键和值,以及如果
config.is_encoder_decoder=True
在交叉注意力块中),这些状态可以用来(参见past_key_values
输入)加速顺序解码。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
andconfig.add_cross_attention=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
模型输出的基类,可能还包含过去的键/值(以加速顺序解码)。
“返回一个新对象,用新值替换指定的字段。
FlaxSeq2SeqModelOutput
类 transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput
< source >( last_hidden_state: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 decoder_hidden_states: 可选[元组[jax.数组]] = 无 decoder_attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 encoder_last_hidden_state: 可选[jax.数组] = 无 encoder_hidden_states: 可选[元组[jax.数组]] = 无 encoder_attentions: 可选[元组[jax.数组]] = 无 )
参数
- last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
) — Sequence of hidden-states at the output of the last layer of the decoder of the model.如果使用了
past_key_values
,则只输出形状为(batch_size, 1, hidden_size)
的序列的最后一个隐藏状态。 - past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(jnp.ndarray)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
模型编码器输出的基类,还包含:可以加速顺序解码的预计算隐藏状态。
“返回一个新对象,用新值替换指定的字段。
FlaxCausalLMOutputWithCrossAttentions
类 transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions
< source >( logits: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 hidden_states: 可选[元组[jax.数组]] = 无 attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.交叉注意力权重在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple ofjnp.ndarray
tuples of lengthconfig.n_layers
, with each tuple containing the cached key, value states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting. Only relevant ifconfig.is_decoder = True
.包含预先计算的隐藏状态(注意力块中的键和值),可用于(参见
past_key_values
输入)加速顺序解码。
因果语言模型(或自回归)输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxMaskedLMOutput
类 transformers.modeling_flax_outputs.FlaxMaskedLMOutput
< source >( logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于掩码语言模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxSeq2SeqLMOutput
类 transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput
< source >( logits: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 decoder_hidden_states: 可选[元组[jax.数组]] = 无 decoder_attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 encoder_last_hidden_state: 可选[jax.数组] = 无 encoder_hidden_states: 可选[元组[jax.数组]] = 无 encoder_attentions: 可选[元组[jax.数组]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, sequence_length, config.vocab_size)
) — 语言建模头的预测分数(SoftMax之前每个词汇标记的分数)。 - past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(jnp.ndarray)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列语言模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxNextSentencePredictorOutput
类 transformers.modeling_flax_outputs.FlaxNextSentencePredictorOutput
< source >( logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, 2)
) — 下一个序列预测(分类)头的预测分数(在SoftMax之前的True/False继续的分数)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于预测两个句子是否连续的模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxSequenceClassifierOutput
类 transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput
< source >( logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, config.num_labels)
) — 分类(如果 config.num_labels==1 则为回归)分数(在 SoftMax 之前)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
句子分类模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxSeq2SeqSequenceClassifierOutput
类 transformers.modeling_flax_outputs.FlaxSeq2SeqSequenceClassifierOutput
< source >( logits: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 decoder_hidden_states: 可选[元组[jax.数组]] = 无 decoder_attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 encoder_last_hidden_state: 可选[jax.数组] = 无 encoder_hidden_states: 可选[元组[jax.数组]] = 无 encoder_attentions: 可选[元组[jax.数组]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, config.num_labels)
) — 分类(如果config.num_labels==1则为回归)分数(在SoftMax之前)。 - past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(jnp.ndarray)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列句子分类模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxMultipleChoiceModelOutput
类 transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput
< source >( logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, num_choices)
) — num_choices is the second dimension of the input tensors. (see input_ids above).分类分数(在SoftMax之前)。
- hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
多选模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxTokenClassifierOutput
class transformers.modeling_flax_outputs.FlaxTokenClassifierOutput
< source >( logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- logits (
jnp.ndarray
of shape(batch_size, sequence_length, config.num_labels)
) — 分类分数(在SoftMax之前)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
用于标记分类模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxQuestionAnsweringModelOutput
类 transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput
< source >( start_logits: 数组 = 无 end_logits: 数组 = 无 hidden_states: typing.Optional[typing.Tuple[jax.Array]] = 无 attentions: typing.Optional[typing.Tuple[jax.Array]] = 无 )
参数
- start_logits (
jnp.ndarray
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
jnp.ndarray
of shape(batch_size, sequence_length)
) — 跨度结束分数(在SoftMax之前)。 - hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.模型在每一层输出处的隐藏状态加上初始嵌入输出。
- attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.注意力权重在注意力softmax之后,用于计算自注意力头中的加权平均值。
问答模型输出的基类。
“返回一个新对象,用新值替换指定的字段。
FlaxSeq2SeqQuestionAnsweringModelOutput
类 transformers.modeling_flax_outputs.FlaxSeq2SeqQuestionAnsweringModelOutput
< source >( start_logits: 数组 = 无 end_logits: 数组 = 无 past_key_values: 可选[元组[元组[jax.数组]]] = 无 decoder_hidden_states: 可选[元组[jax.数组]] = 无 decoder_attentions: 可选[元组[jax.数组]] = 无 cross_attentions: 可选[元组[jax.数组]] = 无 encoder_last_hidden_state: 可选[jax.数组] = 无 encoder_hidden_states: 可选[元组[jax.数组]] = 无 encoder_attentions: 可选[元组[jax.数组]] = 无 )
参数
- start_logits (
jnp.ndarray
of shape(batch_size, sequence_length)
) — 跨度起始分数(在SoftMax之前)。 - end_logits (
jnp.ndarray
of shape(batch_size, sequence_length)
) — Span-end scores (before SoftMax). - past_key_values (
tuple(tuple(jnp.ndarray))
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple oftuple(jnp.ndarray)
of lengthconfig.n_layers
, with each tuple having 2 tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
) and 2 additional tensors of shape(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.包含预先计算的隐藏状态(自注意力块和交叉注意力块中的键和值),这些状态可用于(参见
past_key_values
输入)加速顺序解码。 - decoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.解码器在每一层输出处的隐藏状态加上初始嵌入输出。
- decoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
- cross_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.解码器交叉注意力层的注意力权重,在注意力softmax之后,用于计算交叉注意力头中的加权平均值。
- encoder_last_hidden_state (
jnp.ndarray
of shape(batch_size, sequence_length, hidden_size)
, optional) — 模型编码器最后一层输出的隐藏状态序列。 - encoder_hidden_states (
tuple(jnp.ndarray)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple ofjnp.ndarray
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.编码器在每一层输出处的隐藏状态加上初始嵌入输出。
- encoder_attentions (
tuple(jnp.ndarray)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple ofjnp.ndarray
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.编码器的注意力权重,在注意力softmax之后,用于计算自注意力头中的加权平均值。
序列到序列问答模型输出的基类。
“返回一个新对象,用新值替换指定的字段。