Transformers 文档

飞马

Transformers

天马

概述

Pegasus模型由Jingqing Zhang、Yao Zhao、Mohammad Saleh和Peter J. Liu于2019年12月18日在PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中提出。

根据摘要，

Pegasus的预训练任务特意设计得与摘要生成相似：从输入文档中移除/掩码重要句子，并从剩余的句子中生成一个输出序列，类似于抽取式摘要。
Pegasus 在所有 12 个下游任务上实现了 SOTA 摘要性能，通过 ROUGE 和人工评估来衡量。

该模型由sshleifer贡献。作者的代码可以在这里找到。

使用提示

序列到序列模型，具有与BART相同的编码器-解码器模型架构。Pegasus通过两个自监督目标函数进行联合预训练：掩码语言建模（MLM）和一种新颖的摘要特定预训练目标，称为间隙句子生成（GSG）。
- MLM: encoder input tokens are randomly replaced by a mask tokens and have to be predicted by the encoder (like in BERT)
- GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a causal mask to hide the future words like a regular auto-regressive transformer decoder.
不支持FP16（对此的帮助/想法表示赞赏！）。
推荐使用adafactor优化器进行pegasus微调。

检查点

所有的检查点都针对摘要进行了微调，除了pegasus-large，其他检查点都是基于此进行微调的：

每个检查点在磁盘上为2.2 GB，包含568M个参数。
不支持FP16（对此的帮助/想法非常感激！）。
在v100 GPU上使用默认参数，对xsum进行fp32格式的总结大约需要400ms/样本。
完整的复制结果和正确预处理的数据可以在此Issue中找到。
Distilled checkpoints 在这篇论文中有描述。

实现说明

所有模型都是变压器编码器-解码器，每个组件有16层。
实现完全继承自 BartForConditionalGeneration
一些关键的配置差异：
- 静态的、正弦位置嵌入
- 模型以pad_token_id（其token_embedding为0）作为前缀开始生成。
- 使用了更多的beam（num_beams=8）
所有预训练的Pegasus检查点除了三个属性外都是相同的：tokenizer.model_max_length（最大输入大小）、max_length（生成的最大令牌数）和length_penalty。
用于转换在作者repo中训练的检查点的代码可以在convert_pegasus_tf_to_pytorch.py中找到。

使用示例

>>> from transformers import PegasusForConditionalGeneration, PegasusTokenizer
>>> import torch

>>> src_text = [
...     """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
... ]

... model_name = "google/pegasus-xsum"
... device = "cuda" if torch.cuda.is_available() else "cpu"
... tokenizer = PegasusTokenizer.from_pretrained(model_name)
... model = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
... batch = tokenizer(src_text, truncation=True, padding="longest", return_tensors="pt").to(device)
... translated = model.generate(**batch)
... tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
... assert (
...     tgt_text[0]
...     == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
... )

资源

Script 用于在 XSUM 数据集上微调 pegasus。数据下载说明在 examples/pytorch/summarization/。
因果语言建模任务指南
翻译任务指南
Summarization task guide

PegasusConfig

类 transformers.PegasusConfig

< source >

( vocab_size = 50265 max_position_embeddings = 1024 encoder_layers = 12 encoder_ffn_dim = 4096 encoder_attention_heads = 16 decoder_layers = 12 decoder_ffn_dim = 4096 decoder_attention_heads = 16 encoder_layerdrop = 0.0 decoder_layerdrop = 0.0 use_cache = True is_encoder_decoder = True activation_function = 'gelu' d_model = 1024 dropout = 0.1 attention_dropout = 0.0 activation_dropout = 0.0 init_std = 0.02 decoder_start_token_id = 0 scale_embedding = False pad_token_id = 0 eos_token_id = 1 forced_eos_token_id = 1 **kwargs )

参数

vocab_size (int, 可选, 默认为 50265) — PEGASUS 模型的词汇表大小。定义了可以通过调用 PegasusModel 或 TFPegasusModel 时传递的 inputs_ids 表示的不同标记的数量。
d_model (int, optional, 默认为 1024) — 层和池化层的维度。
encoder_layers (int, optional, defaults to 12) — 编码器层数.
decoder_layers (int, optional, defaults to 12) — 解码器层数.
encoder_attention_heads (int, optional, 默认为 16) — Transformer 编码器中每个注意力层的注意力头数。
decoder_attention_heads (int, optional, defaults to 16) — Transformer解码器中每个注意力层的注意力头数量。
decoder_ffn_dim (int, optional, defaults to 4096) — 解码器中“中间”（通常称为前馈）层的维度。
encoder_ffn_dim (int, optional, defaults to 4096) — 解码器中“中间”（通常称为前馈）层的维度。
activation_function (str 或 function, 可选, 默认为 "gelu") — 编码器和池化器中的非线性激活函数（函数或字符串）。如果是字符串，支持 "gelu"、 "relu"、"silu" 和 "gelu_new"。
dropout (float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的dropout概率。
attention_dropout (float, optional, defaults to 0.0) — 注意力概率的丢弃比率。
activation_dropout (float, optional, defaults to 0.0) — 全连接层内激活函数的丢弃比例。
max_position_embeddings (int, optional, 默认为 1024) — 此模型可能使用的最大序列长度。通常将其设置为较大的值以防万一（例如，512 或 1024 或 2048）。
init_std (float, optional, 默认为 0.02) — 用于初始化所有权重矩阵的 truncated_normal_initializer 的标准差。
encoder_layerdrop (float, optional, 默认为 0.0) — 编码器的LayerDrop概率。有关更多详细信息，请参阅[LayerDrop论文](see https://arxiv.org/abs/1909.11556)。
decoder_layerdrop (float, 可选, 默认为 0.0) — 解码器的LayerDrop概率。有关更多详细信息，请参阅[LayerDrop论文](见 https://arxiv.org/abs/1909.11556)。
scale_embedding (bool, 可选, 默认为 False) — 通过除以 sqrt(d_model) 来缩放嵌入向量。
use_cache (bool, 可选, 默认为 True) — 模型是否应返回最后的键/值注意力（并非所有模型都使用）
forced_eos_token_id (int, 可选, 默认为 1) — 当达到max_length时，强制作为最后生成的令牌的令牌ID。通常设置为eos_token_id.

这是用于存储PegasusModel配置的配置类。它用于根据指定的参数实例化一个PEGASUS模型，定义模型架构。使用默认值实例化配置将产生与PEGASUS google/pegasus-large架构相似的配置。

配置对象继承自PretrainedConfig，可用于控制模型输出。阅读PretrainedConfig的文档以获取更多信息。

示例：

>>> from transformers import PegasusConfig, PegasusModel

>>> # Initializing a PEGASUS google/pegasus-large style configuration
>>> configuration = PegasusConfig()

>>> # Initializing a model (with random weights) from the google/pegasus-large style configuration
>>> model = PegasusModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

PegasusTokenizer

警告：add_tokens 目前无法使用。

类 transformers.PegasusTokenizer

< source >

( vocab_file pad_token = '' eos_token = '' unk_token = '' mask_token = '' mask_token_sent = '' additional_special_tokens = None offset = 103 sp_model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None **kwargs )

参数

vocab_file (str) — SentencePiece 文件（通常具有 .spm 扩展名），包含实例化分词器所需的词汇表。
pad_token (str, 可选, 默认为 "") — 用于填充的标记，例如在对不同长度的序列进行批处理时使用。
eos_token (str, optional, defaults to "</s>") — The end of sequence token.

在使用特殊标记构建序列时，这不是用于序列结束的标记。使用的标记是sep_token。
unk_token (str, optional, defaults to "") — 未知标记。不在词汇表中的标记无法转换为ID，而是设置为这个标记。
mask_token (str, optional, defaults to "") — 用于屏蔽单个令牌值的令牌。这是在用屏蔽语言建模（MLM）训练此模型时使用的令牌。这是PEGASUS编码器在预训练期间将尝试预测的令牌。它对应于PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中的[MASK2].
mask_token_sent (str, 可选, 默认为 "") — 用于屏蔽整个目标句子的标记。这是在训练此模型时使用的标记，用于生成间隙句子（GSG）。这是PEGASUS解码器在预训练期间尝试预测的句子。它对应于PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中的[MASK1]。
additional_special_tokens (List[str], 可选) — 分词器使用的额外特殊标记。如果没有提供额外的特殊标记，和将被用作额外的特殊标记，对应于仅使用标记 2 - 104 进行预训练的原始 PEGASUS 分词器
sp_model_kwargs (dict, optional) — Will be passed to the SentencePieceProcessor.__init__() method. The Python wrapper for SentencePiece can be used, among other things, to set:
- enable_sampling: 启用子词正则化。
- nbest_size: 用于unigram的采样参数。对于BPE-Dropout无效。
  - nbest_size = {0,1}: No sampling is performed.
  - nbest_size > 1: samples from the nbest_size results.
  - nbest_size < 0: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm.
- alpha: 用于单字采样的平滑参数，以及BPE-dropout的合并操作丢弃概率。

构建一个PEGASUS分词器。基于SentencePiece。

此分词器继承自PreTrainedTokenizer，其中包含了大部分主要方法。用户应参考此超类以获取有关这些方法的更多信息。

build_inputs_with_special_tokens

< source >

( token_ids_0 token_ids_1 = 无 ) → List[int]

参数

token_ids_0 (List[int]) — 特殊令牌将被添加到的ID列表。
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。

List[int]

带有适当特殊标记的输入ID列表。

通过连接和添加特殊标记，从序列或序列对构建序列分类任务的模型输入。PEGASUS序列具有以下格式，其中X代表序列：

单个序列: X
序列对：A B （非预期用途）

BOS 从未被使用。序列对不是预期的使用场景，但它们将在没有分隔符的情况下处理。

convert_tokens_to_string

< source >

( tokens )

将一系列标记（字符串）转换为单个字符串。

get_special_tokens_mask

< source >

( token_ids_0: 类型列表 token_ids_1: 类型可选[类型列表] = 无 already_has_special_tokens: 布尔值 = 假 )

获取列表，其中如果标记是[eos]或[pad]，则条目为[1]，否则为0。

num_special_tokens_to_add

< source >

( pair = False )

仅EOS

PegasusTokenizerFast

类 transformers.PegasusTokenizerFast

< source >

( vocab_file = None tokenizer_file = None pad_token = '' eos_token = '' unk_token = '' mask_token = '' mask_token_sent = '' additional_special_tokens = None offset = 103 **kwargs )

参数

vocab_file (str) — SentencePiece 文件（通常具有 .spm 扩展名），包含实例化分词器所需的词汇表。
pad_token (str, optional, defaults to "") — 用于填充的标记，例如在对不同长度的序列进行批处理时使用。
eos_token (str, optional, defaults to "</s>") — The end of sequence token.

在使用特殊标记构建序列时，这不是用于序列结束的标记。使用的标记是sep_token。
unk_token (str, optional, defaults to "") — 未知标记。不在词汇表中的标记无法转换为ID，而是设置为这个标记。
mask_token (str, optional, defaults to "") — 用于屏蔽单个令牌值的令牌。这是在训练此模型时使用的令牌，用于屏蔽语言建模（MLM）。这是PEGASUS编码器在预训练期间尝试预测的令牌。它对应于PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中的[MASK2].
mask_token_sent (str, 可选, 默认为 "") — 用于屏蔽整个目标句子的标记。这是在训练此模型时使用的标记，用于生成间隔句子（GSG）。这是PEGASUS解码器在预训练期间尝试预测的句子。它对应于PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中的[MASK1]。
additional_special_tokens (List[str], optional) — 分词器使用的额外特殊标记。如果没有提供额外的特殊标记，和将被用作额外的特殊标记，对应于仅使用标记 2 - 104 进行预训练的原始 PEGASUS 分词器

构建一个“快速”的PEGASUS分词器（基于HuggingFace的tokenizers库）。基于 Unigram。

这个分词器继承自PreTrainedTokenizerFast，其中包含了大部分主要方法。用户应参考这个超类以获取有关这些方法的更多信息。

build_inputs_with_special_tokens

< source >

( token_ids_0 token_ids_1 = 无 ) → List[int]

参数

token_ids_0 (List[int]) — 特殊令牌将被添加到的ID列表
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。

List[int]

带有适当特殊标记的输入ID列表。

通过将eos添加到序列的末尾来构建模型输入。不在前面添加bos标记。

单个序列: X
序列对：A B （非预期用途）

get_special_tokens_mask

< source >

( token_ids_0: 类型列表 token_ids_1: 类型可选[类型列表] = 无 already_has_special_tokens: 布尔值 = 假 )

获取列表，其中如果标记是[eos]或[pad]，则条目为[1]，否则为0。

Pytorch

Hide Pytorch content

PegasusModel

类 transformers.PegasusModel

< source >

( config: PegasusConfig )

参数

config (PegasusConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重，仅加载配置。查看 from_pretrained() 方法以加载模型权重。

裸的PEGASUS模型输出原始隐藏状态，没有任何特定的头部。此模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.Tensor] = None decoder_attention_mask: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.Tuple[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.Tuple[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.Tensor] = None decoder_inputs_embeds: typing.Optional[torch.Tensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

Pegasus 使用 pad_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), 可选) — 默认行为：生成一个忽略decoder_input_ids中填充标记的张量。默认情况下也会使用因果掩码。
head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
decoder_head_mask (torch.Tensor 形状为 (decoder_layers, decoder_attention_heads), 可选) — 用于在解码器中取消选择注意力模块的头部。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择交叉注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层的输出隐藏状态序列。用于解码器的交叉注意力中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
decoder_inputs_embeds (torch.FloatTensor of shape (batch_size, target_sequence_length, hidden_size), optional) — Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation. If past_key_values is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
如果decoder_input_ids和decoder_inputs_embeds都未设置，decoder_inputs_embeds将取inputs_embeds的值。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

transformers.modeling_outputs.Seq2SeqModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.Seq2SeqModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（PegasusConfig）和输入。

last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(torch.FloatTensor)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器在每一层输出的隐藏状态加上可选的初始嵌入输出。
decoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器在每一层输出的隐藏状态加上可选的初始嵌入输出。
encoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

PegasusModel 的 forward 方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, PegasusModel

>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> model = PegasusModel.from_pretrained("google/pegasus-large")

>>> inputs = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt")
>>> decoder_inputs = tokenizer("Studies show that", return_tensors="pt")
>>> outputs = model(input_ids=inputs.input_ids, decoder_input_ids=decoder_inputs.input_ids)

>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 4, 1024]

PegasusForConditionalGeneration

类 transformers.PegasusForConditionalGeneration

< source >

( config: PegasusConfig )

参数

config (PegasusConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

带有语言建模头的PEGASUS模型。可用于摘要生成。该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.Tensor] = None decoder_attention_mask: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.Tuple[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.Tuple[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.Tensor] = None decoder_inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqLMOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

Pegasus 使用 pad_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — 默认行为：生成一个忽略decoder_input_ids中填充标记的张量。默认情况下也会使用因果掩码。
head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
decoder_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择交叉注意力模块的特定头的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头 未被掩码,
- 0 表示头 被掩码.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选的) — 元组由 (last_hidden_state, 可选的: hidden_states, 可选的: attentions) last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选的) 是编码器最后一层的输出隐藏状态序列。用于解码器的交叉注意力中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
decoder_inputs_embeds (torch.FloatTensor of shape (batch_size, target_sequence_length, hidden_size), optional) — Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation. If past_key_values is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
如果decoder_input_ids和decoder_inputs_embeds都未设置，decoder_inputs_embeds将取inputs_embeds的值。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅针对标签在 [0, ..., config.vocab_size] 之间的标记计算。

transformers.modeling_outputs.Seq2SeqLMOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.Seq2SeqLMOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（PegasusConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 语言建模损失。
logits (torch.FloatTensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values (tuple(tuple(torch.FloatTensor))，可选，当传递 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — torch.FloatTensor 元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)，可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — torch.FloatTensor 元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — torch.FloatTensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

PegasusForConditionalGeneration 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

摘要示例：

>>> from transformers import AutoTokenizer, PegasusForConditionalGeneration

>>> model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")

>>> ARTICLE_TO_SUMMARIZE = (
...     "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
...     "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
...     "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
... )
>>> inputs = tokenizer(ARTICLE_TO_SUMMARIZE, max_length=1024, return_tensors="pt")

>>> # Generate Summary
>>> summary_ids = model.generate(inputs["input_ids"])
>>> tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
"California's largest electricity provider has turned off power to hundreds of thousands of customers."

PegasusForCausalLM

类 transformers.PegasusForCausalLM

< source >

( config )

前进

< source >

( input_ids: LongTensor = None attention_mask: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.FloatTensor] = None encoder_attention_mask: typing.Optional[torch.FloatTensor] = None head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 编码器最后一层输出的隐藏状态序列。如果模型配置为解码器，则在交叉注意力中使用。
encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — 用于避免在编码器输入的填充标记索引上执行注意力操作的掩码。如果模型配置为解码器，则在交叉注意力中使用此掩码。掩码值在 [0, 1] 中选择：
head_mask (torch.Tensor 形状为 (decoder_layers, decoder_attention_heads), 可选) — 用于屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). The two additional tensors are only required when the model is used as a decoder in a Sequence to Sequence model.
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的）形状为(batch_size, 1)，而不是所有形状为(batch_size, sequence_length)的decoder_input_ids。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅计算标签在 [0, ..., config.vocab_size] 之间的标记。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码 (参见 past_key_values)。
- 1 表示未被掩码的标记，
- 0 表示被掩码的标记。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（PegasusConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 语言建模损失（用于下一个令牌预测）。
logits (torch.FloatTensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前每个词汇令牌的分数）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的交叉注意力权重，用于计算交叉注意力头中的加权平均值。
past_key_values (tuple(tuple(torch.FloatTensor))，可选，当传递 use_cache=True 或当 config.use_cache=True 时返回) — 由长度为 config.n_layers 的 torch.FloatTensor 元组组成的元组，每个元组包含自注意力和交叉注意力层的缓存键，值状态，如果模型用于编码器-解码器设置。仅在 config.is_decoder = True 时相关。

包含预计算的隐藏状态（注意力块中的键和值），可用于（参见 past_key_values 输入）以加速顺序解码。

示例：

>>> from transformers import AutoTokenizer, PegasusForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> model = PegasusForCausalLM.from_pretrained("google/pegasus-large", add_cross_attention=False)
>>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> logits = outputs.logits
>>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
>>> list(logits.shape) == expected_shape
True

TensorFlow

Hide TensorFlow content

TFPegasusModel

类 transformers.TFPegasusModel

< source >

( config: PegasusConfig *inputs **kwargs )

参数

config (PegasusConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

裸的PEGASUS模型输出原始隐藏状态，没有任何特定的头部。该模型继承自TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

支持第二种格式的原因是，Keras 方法在将输入传递给模型和层时更喜欢这种格式。由于这种支持，当使用像 model.fit() 这样的方法时，事情应该“正常工作”——只需以 model.fit() 支持的任何格式传递你的输入和标签！然而，如果你想在 Keras 方法之外使用第二种格式，比如在使用 Keras Functional API 创建自己的层或模型时，有三种方法可以用来将所有输入张量收集到第一个位置参数中：

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None decoder_input_ids: np.ndarray | tf.Tensor | None = None decoder_attention_mask: np.ndarray | tf.Tensor | None = None decoder_position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None decoder_head_mask: np.ndarray | tf.Tensor | None = None cross_attn_head_mask: np.ndarray | tf.Tensor | None = None encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None inputs_embeds: np.ndarray | tf.Tensor | None = None decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: bool = False **kwargs ) → transformers.modeling_tf_outputs.TFSeq2SeqModelOutput or tuple(tf.Tensor)

参数

input_ids (tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (tf.Tensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

Pegasus 使用 pad_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (tf.Tensor of shape (batch_size, target_sequence_length), optional) — 默认情况下会生成并忽略填充标记。不建议在大多数用例中设置此选项。
decoder_position_ids (tf.Tensor of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
head_mask (tf.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
decoder_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tf.FloatTensor, 可选) — 编码器最后一层输出的隐藏状态。用于解码器的交叉注意力。形状为 (batch_size, sequence_length, hidden_size) 的序列
past_key_values (Tuple[Tuple[tf.Tensor]] 长度为 config.n_layers) — 包含预计算的关键和值隐藏状态的注意力块。可用于加速解码。如果使用了 past_key_values，用户可以选择仅输入形状为 (batch_size, 1) 的最后一个 decoder_input_ids（那些没有将其过去的关键值状态提供给此模型的），而不是所有形状为 (batch_size, sequence_length) 的 decoder_input_ids。
inputs_embeds (tf.Tensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
use_cache (bool, 可选, 默认为 True) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。在训练期间设置为 False，在生成期间设置为 True。output_attentions (bool, 可选): 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的 attentions。此参数只能在急切模式下使用，在图模式下将使用配置中的值代替。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估时具有不同的行为）。

transformers.modeling_tf_outputs.TFSeq2SeqModelOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFSeq2SeqModelOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时）包含各种元素，具体取决于配置 (PegasusConfig) 和输入。

last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (List[tf.Tensor], 可选, 当 use_cache=True 被传递或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tf.Tensor 列表，每个张量的形状为 (2, batch_size, num_heads, sequence_length, embed_size_per_head)。

包含解码器的预计算隐藏状态（注意力块中的键和值），可以用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(tf.Tensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器每层输出的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，经过注意力 softmax 后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(tf.Tensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器每层输出的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。

TFPegasusModel 的 forward 方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFPegasusModel
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> model = TFPegasusModel.from_pretrained("google/pegasus-large")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs.last_hidden_state

TFPegasusForConditionalGeneration

类 transformers.TFPegasusForConditionalGeneration

< source >

( config *inputs **kwargs )

参数

config (PegasusConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

带有语言建模头的PEGASUS模型。可用于摘要生成。该模型继承自TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None decoder_input_ids: np.ndarray | tf.Tensor | None = None decoder_attention_mask: np.ndarray | tf.Tensor | None = None decoder_position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None decoder_head_mask: np.ndarray | tf.Tensor | None = None cross_attn_head_mask: np.ndarray | tf.Tensor | None = None encoder_outputs: Optional[TFBaseModelOutput] = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None inputs_embeds: np.ndarray | tf.Tensor | None = None decoder_inputs_embeds: np.ndarray | tf.Tensor | None = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: bool = False ) → transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或 tuple(tf.Tensor)

参数

input_ids (tf.Tensor of shape ({0})) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (tf.Tensor of shape ({0}), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (tf.Tensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

Pegasus 使用 pad_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (tf.Tensor of shape (batch_size, target_sequence_length), optional) — 默认情况下会生成并忽略填充标记。不建议在大多数用例中设置此选项。
decoder_position_ids (tf.Tensor of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围为 [0, config.max_position_embeddings - 1].
head_mask (tf.Tensor 形状为 (encoder_layers, encoder_attention_heads), 可选) — 用于在编码器中取消选择注意力模块中的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
decoder_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的头部。选择的掩码值在 [0, 1] 中：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (tf.Tensor 形状为 (decoder_layers, decoder_attention_heads), 可选) — 用于屏蔽交叉注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tf.FloatTensor, optional) — 编码器最后一层输出的隐藏状态。用于解码器的交叉注意力。形状为 (batch_size, sequence_length, hidden_size) 的序列
past_key_values (Tuple[Tuple[tf.Tensor]] 长度为 config.n_layers) — 包含预计算的关键和值隐藏状态的注意力块。可用于加速解码。如果使用了 past_key_values，用户可以选择仅输入形状为 (batch_size, 1) 的最后一个 decoder_input_ids（那些没有将其过去的关键值状态提供给此模型的），而不是所有形状为 (batch_size, sequence_length) 的 decoder_input_ids。
inputs_embeds (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
use_cache (bool, 可选, 默认为 True) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。在训练期间设置为 False，在生成期间设置为 True。output_attentions (bool, 可选): 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的 attentions。此参数只能在急切模式下使用，在图形模式下将使用配置中的值代替。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量下的 hidden_states。此参数只能在急切模式下使用，在图形模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。此参数可以在急切模式下使用，在图形模式下该值将始终设置为 True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如 dropout 模块在训练和评估之间有不同的行为）。
labels (tf.tensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅针对标签在 [0, ..., config.vocab_size] 之间的标记计算。

transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时）包含各种元素，具体取决于配置 (PegasusConfig) 和输入。

loss (tf.Tensor 形状为 (n,), 可选, 其中 n 是非掩码标签的数量，当提供 labels 时返回) — 语言建模损失。
logits (tf.Tensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values (List[tf.Tensor], 可选, 当传递 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tf.Tensor 列表，每个张量形状为 (2, batch_size, num_heads, sequence_length, embed_size_per_head))。

包含解码器的预计算隐藏状态（注意力块中的键和值），可以使用（参见 past_key_values 输入）来加速顺序解码。
decoder_hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

TFPegasusForConditionalGeneration 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

摘要示例：

>>> from transformers import AutoTokenizer, TFPegasusForConditionalGeneration

>>> model = TFPegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-xsum")

>>> ARTICLE_TO_SUMMARIZE = (
...     "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
...     "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
...     "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
... )
>>> inputs = tokenizer(ARTICLE_TO_SUMMARIZE, max_length=1024, return_tensors="tf")

>>> # Generate Summary
>>> summary_ids = model.generate(input_ids)
>>> print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False))

JAX

Hide JAX content

FlaxPegasusModel

类 transformers.FlaxPegasusModel

< source >

( config: PegasusConfig input_shape: typing.Tuple[int] = (1, 1) seed: int = 0 dtype: dtype = _do_init: bool = True **kwargs )

参数

config (PegasusConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。
dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).
这可以用于在GPU或TPU上启用混合精度训练或半精度推理。如果指定，所有计算将使用给定的dtype执行。

请注意，这仅指定了计算的数据类型，并不影响模型参数的数据类型。

如果您希望更改模型参数的dtype，请参阅to_fp16()和 to_bf16().

裸的Pegasus模型转换器输出原始隐藏状态，没有任何特定的头部。此模型继承自FlaxPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个Flax Linen flax.nn.Module 子类。将其作为常规的Flax模块使用，并参考Flax文档以获取与一般用法和行为相关的所有信息。

最后，该模型支持JAX的固有特性，例如：

call

< source >

( input_ids: 数组 attention_mask: 可选的[jax.Array] = 无 decoder_input_ids: 可选的[jax.Array] = 无 decoder_attention_mask: 可选的[jax.Array] = 无 position_ids: 可选的[jax.Array] = 无 decoder_position_ids: 可选的[jax.Array] = 无 output_attentions: 可选的[布尔] = 无 output_hidden_states: 可选的[布尔] = 无 return_dict: 可选的[布尔] = 无 train: 布尔 = 假 params: 字典 = 无 dropout_rng: <函数 PRNGKey 在 0x7f50727b7640> = 无 ) → transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
position_ids (numpy.ndarray 形状为 (batch_size, sequence_length), 可选) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1].
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, optional) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（PegasusConfig）和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

FlaxPegasusPreTrainedModel 的 forward 方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, FlaxPegasusModel

>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> model = FlaxPegasusModel.from_pretrained("google/pegasus-large")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

编码

< source >

( input_ids: 数组 attention_mask: 可选[jax.Array] = 无 position_ids: 可选[jax.Array] = 无 output_attentions: 可选[布尔] = 无 output_hidden_states: 可选[布尔] = 无 return_dict: 可选[布尔] = 无 train: 布尔 = 假 params: 字典 = 无 dropout_rng: <函数 PRNGKey 在 0x7f50727b7640> = 无 ) → transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（）和输入。

last_hidden_state (jnp.ndarray，形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(jnp.ndarray)，可选，当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 jnp.ndarray 组成的元组（一个用于嵌入层的输出，一个用于每一层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray)，可选，当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

示例：

>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained("google/pegasus-large")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

解码

< source >

( decoder_input_ids encoder_outputs encoder_attention_mask: typing.Optional[jax.Array] = None decoder_attention_mask: typing.Optional[jax.Array] = None decoder_position_ids: typing.Optional[jax.Array] = None past_key_values: dict = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None train: bool = False params: dict = None dropout_rng: = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或 tuple(torch.FloatTensor)

参数

decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length)) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？
encoder_outputs (tuple(tuple(jnp.ndarray)) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力机制中。
encoder_attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围为 [0, config.max_position_embeddings - 1].
past_key_values (Dict[str, np.ndarray], 可选, 由 init_cache 返回或传递先前的 past_key_values) — 预计算的隐藏状态字典（注意力块中的键和值），可用于快速自回归解码。预计算的键和值隐藏状态的形状为 [batch_size, max_length].
output_attentions (bool, optional) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（）和输入。

last_hidden_state (jnp.ndarray，形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(jnp.ndarray))，可选，当传递了 use_cache=True 或当 config.use_cache=True 时返回）— 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量，并且如果 config.is_encoder_decoder=True，则还包含 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块中的键和值，并且如果 config.is_encoder_decoder=True，则还包含交叉注意力块中的键和值），可以用于（参见 past_key_values 输入）加速顺序解码。
hidden_states (tuple(jnp.ndarray)，可选，当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 jnp.ndarray 组成的元组（一个用于嵌入的输出，一个用于每一层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray)，可选，当传递了 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 jnp.ndarray 组成的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力权重在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray)，可选，当传递了 output_attentions=True 和 config.add_cross_attention=True 或当 config.output_attentions=True 时返回）— 由 jnp.ndarray 组成的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。

示例：

>>> import jax.numpy as jnp
>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained("google/pegasus-large")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id

>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> last_decoder_hidden_states = outputs.last_hidden_state

FlaxPegasusForConditionalGeneration

类 transformers.FlaxPegasusForConditionalGeneration

< source >

( config: PegasusConfig input_shape: typing.Tuple[int] = (1, 1) seed: int = 0 dtype: dtype = _do_init: bool = True **kwargs )

参数

config (PegasusConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。
dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).
这可以用于在GPU或TPU上启用混合精度训练或半精度推理。如果指定，所有计算将使用给定的dtype执行。

请注意，这仅指定了计算的数据类型，并不影响模型参数的数据类型。

如果您希望更改模型参数的dtype，请参阅to_fp16()和 to_bf16().

带有语言建模头的PEGASUS模型。可用于摘要生成。该模型继承自FlaxPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头等）。

该模型也是一个Flax Linen flax.nn.Module 子类。将其作为常规的Flax模块使用，并参考Flax文档以获取与一般用法和行为相关的所有信息。

最后，该模型支持JAX的固有特性，例如：

call

< source >

( input_ids: 数组 attention_mask: 可选的[jax.Array] = 无 decoder_input_ids: 可选的[jax.Array] = 无 decoder_attention_mask: 可选的[jax.Array] = 无 position_ids: 可选的[jax.Array] = 无 decoder_position_ids: 可选的[jax.Array] = 无 output_attentions: 可选的[bool] = 无 output_hidden_states: 可选的[bool] = 无 return_dict: 可选的[bool] = 无 train: bool = 假 params: 字典 = 无 dropout_rng: = 无 ) → transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（PegasusConfig）和输入。

logits (jnp.ndarray 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的交叉注意力层的注意力权重，经过注意力 softmax 后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

编码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。

FlaxPegasusPreTrainedModel 的 forward 方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

摘要示例：

>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained('google/pegasus-large')
>>> tokenizer = AutoTokenizer.from_pretrained('google/pegasus-large')

>>> ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='np')

>>> # Generate Summary
>>> summary_ids = model.generate(inputs['input_ids']).sequences
>>> print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False))

掩码填充示例：

>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> TXT = "My friends are <mask> but they eat too many carbs."

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained("google/pegasus-large")
>>> input_ids = tokenizer([TXT], return_tensors="np")["input_ids"]
>>> logits = model(input_ids).logits

>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
>>> probs = jax.nn.softmax(logits[0, masked_index], axis=0)
>>> values, predictions = jax.lax.top_k(probs)

>>> tokenizer.decode(predictions).split()

编码

< source >

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutput 或一个包含各种元素的 torch.FloatTensor 元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），具体取决于配置（）和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入层的输出，一个用于每一层的输出）。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每一层一个）。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

示例：

>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained("google/pegasus-large")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

解码

< source >

( decoder_input_ids encoder_outputs encoder_attention_mask: typing.Optional[jax.Array] = None decoder_attention_mask: typing.Optional[jax.Array] = None decoder_position_ids: typing.Optional[jax.Array] = None past_key_values: dict = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None deterministic: bool = True params: dict = None dropout_rng: = None ) → transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

参数

decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length)) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？
encoder_outputs (tuple(tuple(jnp.ndarray)) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力中。
encoder_attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在[0, config.max_position_embeddings - 1]之间。
past_key_values (Dict[str, np.ndarray], 可选, 由 init_cache 返回或传递先前的 past_key_values) — 预计算的隐藏状态字典（注意力块中的键和值），可用于快速自回归解码。预计算的键和值隐藏状态的形状为 [batch_size, max_length].
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, optional) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（）和输入。

logits (jnp.ndarray 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 jnp.ndarray 组成的元组（一个用于嵌入的输出 + 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的交叉注意力权重，用于计算交叉注意力头中的加权平均值。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 由 jnp.ndarray 元组组成的元组，长度为 config.n_layers，每个元组包含自注意力和交叉注意力层的缓存键、值状态，如果模型用于编码器-解码器设置。仅在 config.is_decoder = True 时相关。

包含预计算的隐藏状态（注意力块中的键和值），可用于（参见 past_key_values 输入）以加速顺序解码。

示例：

>>> import jax.numpy as jnp
>>> from transformers import AutoTokenizer, FlaxPegasusForConditionalGeneration

>>> model = FlaxPegasusForConditionalGeneration.from_pretrained("google/pegasus-large")
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id

>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits

< > Update on GitHub

←OPT PEGASUS-X→