Transformers 文档

Blenderbot 小型

Transformers

Blenderbot 小型

请注意，BlenderbotSmallModel 和 BlenderbotSmallForConditionalGeneration 仅与检查点 facebook/blenderbot-90M 结合使用。较大的 Blenderbot 检查点应与 BlenderbotModel 和 BlenderbotForConditionalGeneration 一起使用。

概述

Blender聊天机器人模型是在2020年4月30日由Stephen Roller、Emily Dinan、Naman Goyal、Da Ju、Mary Williamson、Yinhan Liu、Jing Xu、Myle Ott、Kurt Shuster、Eric M. Smith、Y-Lan Boureau、Jason Weston在构建开放域聊天机器人的配方中提出的。

论文的摘要如下：

构建开放领域的聊天机器人是机器学习研究中的一个具有挑战性的领域。虽然之前的工作表明，通过增加神经模型的参数数量和训练数据的规模可以改善结果，但我们表明，其他因素对于高性能聊天机器人同样重要。良好的对话需要多种技能，这些技能由专家对话者无缝地融合在一起：提供引人入胜的谈话要点并倾听他们的伙伴，适当地展示知识、同理心和个性，同时保持一致的角色。我们表明，当提供适当的训练数据和生成策略选择时，大规模模型可以学习这些技能。我们构建了这些方法的变体，包括90M、2.7B和9.4B参数的模型，并将我们的模型和代码公开。人类评估显示，我们的最佳模型在多轮对话中的吸引力和人性化测量方面优于现有方法。然后，我们通过分析模型的失败案例来讨论这项工作的局限性。

该模型由patrickvonplaten贡献。作者的代码可以在这里找到。

使用提示

Blenderbot Small 是一个具有绝对位置嵌入的模型，因此通常建议在右侧而不是左侧填充输入。

资源

BlenderbotSmallConfig

类 transformers.BlenderbotSmallConfig

( vocab_size = 50265 max_position_embeddings = 512 encoder_layers = 8 encoder_ffn_dim = 2048 encoder_attention_heads = 16 decoder_layers = 8 decoder_ffn_dim = 2048 decoder_attention_heads = 16 encoder_layerdrop = 0.0 decoder_layerdrop = 0.0 use_cache = True is_encoder_decoder = True activation_function = 'gelu' d_model = 512 dropout = 0.1 attention_dropout = 0.0 activation_dropout = 0.0 init_std = 0.02 decoder_start_token_id = 1 scale_embedding = False pad_token_id = 0 bos_token_id = 1 eos_token_id = 2 forced_eos_token_id = 2 **kwargs )

参数

vocab_size (int, 可选, 默认为 50265) — BlenderbotSmall 模型的词汇量大小。定义了调用 BlenderbotSmallModel 或 TFBlenderbotSmallModel 时传递的 inputs_ids 可以表示的不同标记的数量。
d_model (int, optional, defaults to 512) — 层的维度和池化层的维度。
encoder_layers (int, optional, 默认为 8) — 编码器层数。
decoder_layers (int, optional, defaults to 8) — 解码器层数.
encoder_attention_heads (int, optional, 默认为 16) — Transformer 编码器中每个注意力层的注意力头数。
decoder_attention_heads (int, optional, defaults to 16) — Transformer解码器中每个注意力层的注意力头数。
decoder_ffn_dim (int, optional, defaults to 2048) — 解码器中“中间”（通常称为前馈）层的维度。
encoder_ffn_dim (int, optional, defaults to 2048) — 解码器中“中间”（通常称为前馈）层的维度。
activation_function (str 或 function, 可选, 默认为 "gelu") — 编码器和池化器中的非线性激活函数（函数或字符串）。如果是字符串，支持 "gelu"、 "relu"、"silu" 和 "gelu_new"。
dropout (float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的dropout概率。
attention_dropout (float, optional, 默认为 0.0) — 注意力概率的丢弃比率。
activation_dropout (float, optional, 默认为 0.0) — 全连接层内激活函数的丢弃比例。
max_position_embeddings (int, 可选, 默认为 512) — 此模型可能使用的最大序列长度。通常将其设置为较大的值以防万一（例如，512、1024 或 2048）。
init_std (float, optional, 默认为 0.02) — 用于初始化所有权重矩阵的 truncated_normal_initializer 的标准差。
encoder_layerdrop (float, optional, 默认为 0.0) — 编码器的LayerDrop概率。有关更多详细信息，请参阅[LayerDrop论文](see https://arxiv.org/abs/1909.11556)。
decoder_layerdrop (float, 可选, 默认为 0.0) — 解码器的LayerDrop概率。有关更多详细信息，请参阅[LayerDrop论文](见 https://arxiv.org/abs/1909.11556)。
scale_embedding (bool, optional, defaults to False) — 通过除以 sqrt(d_model) 来缩放嵌入向量。
use_cache (bool, 可选, 默认为 True) — 模型是否应返回最后的键/值注意力（并非所有模型都使用）
forced_eos_token_id (int, 可选, 默认为 2) — 当达到max_length时，强制作为最后生成的令牌的ID。通常设置为eos_token_id.

这是用于存储BlenderbotSmallModel配置的配置类。它用于根据指定的参数实例化一个BlenderbotSmall模型，定义模型架构。使用默认值实例化配置将产生与BlenderbotSmall facebook/blenderbot_small-90M架构类似的配置。

配置对象继承自PretrainedConfig，可用于控制模型输出。阅读PretrainedConfig的文档以获取更多信息。

示例：

>>> from transformers import BlenderbotSmallConfig, BlenderbotSmallModel

>>> # Initializing a BlenderbotSmall facebook/blenderbot_small-90M style configuration
>>> configuration = BlenderbotSmallConfig()

>>> # Initializing a model (with random weights) from the facebook/blenderbot_small-90M style configuration
>>> model = BlenderbotSmallModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

BlenderbotSmallTokenizer

类 transformers.BlenderbotSmallTokenizer

( vocab_file merges_file bos_token = '__start__' eos_token = '__end__' unk_token = '__unk__' pad_token = '__null__' **kwargs )

参数

vocab_file (str) — 包含词汇表的文件。
merges_file (str) — 合并文件的路径。
bos_token (str, optional, defaults to "__start__") — 句子的开始标记。
eos_token (str, optional, defaults to "__end__") — 句子的结束标记。
unk_token (str, optional, defaults to "__unk__") — 未知的标记。不在词汇表中的标记无法转换为ID，而是设置为这个标记。
pad_token (str, optional, defaults to "__null__") — 用于填充的标记，例如在对不同长度的序列进行批处理时使用。
kwargs (可选) — 传递给 PreTrainedTokenizer 的额外关键字参数

基于BPE（字节对编码）构建一个Blenderbot-90M分词器

这个分词器继承自PreTrainedTokenizer，其中包含了大部分主要方法。用户应参考超类以获取有关方法的更多信息。

build_inputs_with_special_tokens

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

参数

token_ids_0 (List[int]) — 第一个分词序列.
token_ids_1 (List[int], 可选) — 第二个标记化序列.

返回

List[int]

带有特殊标记的模型输入。

通过连接和添加特殊标记，从序列或序列对构建序列分类任务的模型输入。

此实现不添加特殊标记，此方法应在子类中被重写。

get_special_tokens_mask

( token_ids_0: typing.List token_ids_1: typing.Optional[typing.List] = None already_has_special_tokens: bool = False ) → 一个在范围 [0, 1] 内的整数列表

参数

token_ids_0 (List[int]) — 第一个序列的ID列表。
token_ids_1 (List[int], 可选) — 第二个序列的ID列表。
already_has_special_tokens (bool, optional, defaults to False) — 是否已经为模型格式化了带有特殊标记的标记列表。

返回

一个在范围 [0, 1] 内的整数列表

1 表示特殊标记，0 表示序列标记。

从没有添加特殊标记的标记列表中检索序列ID。当使用标记器的prepare_for_model或encode_plus方法添加特殊标记时，会调用此方法。

create_token_type_ids_from_sequences

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

参数

token_ids_0 (List[int]) — 第一个标记化的序列.
token_ids_1 (List[int], optional) — 第二个标记化序列.

返回

List[int]

令牌类型ID。

创建与传递的序列相对应的令牌类型ID。什么是令牌类型ID？

如果模型有特殊的构建方式，应该在子类中重写。

保存词汇表

( 保存目录: str 文件名前缀: typing.Optional[str] = None )

BlenderbotSmallTokenizerFast

类 transformers.BlenderbotSmallTokenizerFast

( vocab_file = None merges_file = None unk_token = '<|endoftext|>' bos_token = '<|endoftext|>' eos_token = '<|endoftext|>' add_prefix_space = False trim_offsets = True **kwargs )

参数

vocab_file (str) — 词汇表文件的路径。

构建一个“快速”的BlenderbotSmall分词器（由HuggingFace的tokenizers库支持）。

create_token_type_ids_from_sequences

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

参数

token_ids_0 (List[int]) — ID列表.
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。

返回

List[int]

零的列表。

从传递给序列对分类任务的两个序列中创建一个掩码。BlenderbotSmall 不使用令牌类型ID，因此返回一个零列表。

Pytorch

Hide Pytorch content

BlenderbotSmallModel

类 transformers.BlenderbotSmallModel

( config: BlenderbotSmallConfig )

参数

config (BlenderbotSmallConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化时不会加载与模型相关的权重，仅加载配置。查看 from_pretrained() 方法以加载模型权重。

裸的BlenderbotSmall模型输出原始隐藏状态，没有任何特定的头部。此模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Union[typing.Tuple, transformers.modeling_outputs.BaseModelOutput, NoneType] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.Tensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

BlenderbotSmall 使用 bos_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — 默认行为：生成一个忽略decoder_input_ids中填充标记的张量。默认情况下也会使用因果掩码。
head_mask (torch.Tensor 形状为 (encoder_layers, encoder_attention_heads), 可选) — 用于在编码器中屏蔽注意力模块中选定的头。掩码值在 [0, 1] 中选择：
- 1 表示头 未被屏蔽,
- 0 表示头 被屏蔽.
decoder_head_mask (torch.Tensor 形状为 (decoder_layers, decoder_attention_heads), 可选) — 用于在解码器中取消选择注意力模块的头部。选择的掩码值在 [0, 1] 中：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择交叉注意力模块中的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部未被掩码,
- 0 表示头部被掩码.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力机制中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
decoder_inputs_embeds (torch.FloatTensor of shape (batch_size, target_sequence_length, hidden_size), optional) — Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation. If past_key_values is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
如果decoder_input_ids和decoder_inputs_embeds都未设置，decoder_inputs_embeds将取inputs_embeds的值。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

返回

transformers.modeling_outputs.Seq2SeqModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.Seq2SeqModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（BlenderbotSmallConfig）和输入。

last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(torch.FloatTensor)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器每层输出的隐藏状态加上可选的初始嵌入输出。
decoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器交叉注意力层的注意力权重，经过注意力 softmax 后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器每层输出的隐藏状态加上可选的初始嵌入输出。
encoder_attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。

BlenderbotSmallModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, BlenderbotSmallModel

>>> model = BlenderbotSmallModel.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> inputs = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt")
>>> decoder_inputs = tokenizer("Studies show that", return_tensors="pt")  # Batch size 1
>>> outputs = model(input_ids=inputs.input_ids, decoder_input_ids=decoder_inputs.input_ids)

>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 3, 512]

BlenderbotSmallForConditionalGeneration

类 transformers.BlenderbotSmallForConditionalGeneration

( config: BlenderbotSmallConfig )

参数

config (BlenderbotSmallConfig) — 模型配置类，包含模型的所有参数。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

带有语言建模头的BlenderbotSmall模型。可用于摘要生成。该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Union[typing.Tuple, transformers.modeling_outputs.BaseModelOutput, NoneType] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.Tensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqLMOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

BlenderbotSmall 使用 bos_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), 可选) — 默认行为：生成一个忽略decoder_input_ids中填充标记的张量。默认情况下也会使用因果掩码。
head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional) — 用于在编码器中屏蔽注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
decoder_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择交叉注意力模块中的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
encoder_outputs (tuple(tuple(torch.FloatTensor), 可选) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层的输出隐藏状态序列。用于解码器的交叉注意力中。
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入形状为(batch_size, 1)的最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的），而不是形状为(batch_size, sequence_length)的所有decoder_input_ids。
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
decoder_inputs_embeds (torch.FloatTensor of shape (batch_size, target_sequence_length, hidden_size), optional) — Optionally, instead of passing decoder_input_ids you can choose to directly pass an embedded representation. If past_key_values is used, optionally only the last decoder_inputs_embeds have to be input (see past_key_values). This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
如果decoder_input_ids和decoder_inputs_embeds都未设置，decoder_inputs_embeds将取inputs_embeds的值。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅针对标签在 [0, ..., config.vocab_size] 之间的标记计算。

返回

transformers.modeling_outputs.Seq2SeqLMOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.Seq2SeqLMOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（BlenderbotSmallConfig）和输入。

loss（形状为 (1,) 的 torch.FloatTensor，可选，当提供 labels 时返回）— 语言建模损失。
logits（形状为 (batch_size, sequence_length, config.vocab_size) 的 torch.FloatTensor）— 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values（tuple(tuple(torch.FloatTensor))，可选，当传递 use_cache=True 或当 config.use_cache=True 时返回）— 长度为 config.n_layers 的 tuple(torch.FloatTensor) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states（tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions（tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions（tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state（形状为 (batch_size, sequence_length, hidden_size) 的 torch.FloatTensor，可选）— 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states（tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions（tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

BlenderbotSmallForConditionalGeneration 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

对话示例：

>>> from transformers import AutoTokenizer, BlenderbotSmallForConditionalGeneration

>>> mname = "facebook/blenderbot_small-90M"
>>> model = BlenderbotSmallForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = AutoTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> print("Human: ", UTTERANCE)
Human:  My friends are cool but they eat too many carbs.

>>> inputs = tokenizer([UTTERANCE], return_tensors="pt")
>>> reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0])
Bot:  what kind of carbs do they eat? i don't know much about carbs.

>>> REPLY = "I'm not sure"
>>> print("Human: ", REPLY)
Human: I'm not sure

>>> NEXT_UTTERANCE = (
...     "My friends are cool but they eat too many carbs.__end__ __start__what kind of carbs do they eat? "
...     "i don't know much about carbs__end__ "
...     "__start__ I'm not sure."
... )
>>> inputs = tokenizer([NEXT_UTTERANCE], return_tensors="pt")
>>> next_reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(next_reply_ids, skip_special_tokens=True)[0])
Bot:  they eat a lot of carbs. carbs are high in fat, protein, and fats.

BlenderbotSmallForCausalLM

类 transformers.BlenderbotSmallForCausalLM

( config )

前进

( input_ids: LongTensor = None attention_mask: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.FloatTensor] = None encoder_attention_mask: typing.Optional[torch.FloatTensor] = None head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 编码器最后一层输出的隐藏状态序列。如果模型配置为解码器，则在交叉注意力中使用。
encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — 用于避免在编码器输入的填充标记索引上执行注意力的掩码。如果模型配置为解码器，则在交叉注意力中使用此掩码。掩码值在 [0, 1] 中选择：
head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) — Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). The two additional tensors are only required when the model is used as a decoder in a Sequence to Sequence model.
包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），这些状态可用于（参见past_key_values输入）以加速顺序解码。

如果使用了past_key_values，用户可以选择只输入最后一个decoder_input_ids（那些没有将其过去键值状态提供给此模型的）形状为(batch_size, 1)，而不是所有形状为(batch_size, sequence_length)的decoder_input_ids。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅计算标签在 [0, ..., config.vocab_size] 之间的标记。
use_cache (bool, 可选) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码 (参见 past_key_values)。
- 1 表示 未屏蔽 的标记，
- 0 表示屏蔽的标记。
output_attentions (bool, optional) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量下的 hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

返回

transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.CausalLMOutputWithCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（BlenderbotSmallConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 语言建模损失（用于下一个词的预测）。
logits (torch.FloatTensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

交叉注意力 softmax 后的交叉注意力权重，用于计算交叉注意力头中的加权平均值。
past_key_values (tuple(tuple(torch.FloatTensor))，可选，当传递 use_cache=True 或当 config.use_cache=True 时返回) — 由长度为 config.n_layers 的 torch.FloatTensor 元组组成的元组，每个元组包含自注意力和交叉注意力层的缓存键，值状态，如果模型用于编码器-解码器设置。仅在 config.is_decoder = True 时相关。

包含预计算的隐藏状态（注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。

示例：

>>> from transformers import AutoTokenizer, BlenderbotSmallForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")
>>> model = BlenderbotSmallForCausalLM.from_pretrained("facebook/blenderbot_small-90M", add_cross_attention=False)
>>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> logits = outputs.logits
>>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
>>> list(logits.shape) == expected_shape
True

TensorFlow

Hide TensorFlow content

TFBlenderbotSmallModel

类 transformers.TFBlenderbotSmallModel

( config: BlenderbotSmallConfig *inputs **kwargs )

参数

config (BlenderbotSmallConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

裸的BLENDERBOT_SMALL模型输出原始隐藏状态，没有任何特定的头部。该模型继承自TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

支持第二种格式的原因是，Keras 方法在将输入传递给模型和层时更喜欢这种格式。由于这种支持，当使用像 model.fit() 这样的方法时，事情应该“正常工作”——只需以 model.fit() 支持的任何格式传递你的输入和标签！然而，如果你想在 Keras 方法之外使用第二种格式，比如在使用 Keras Functional API 创建自己的层或模型时，有三种方法可以用来将所有输入张量收集到第一个位置参数中：

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

( input_ids: tf.Tensor | None = None attention_mask: tf.Tensor | None = None decoder_input_ids: tf.Tensor | None = None decoder_attention_mask: tf.Tensor | None = None decoder_position_ids: tf.Tensor | None = None head_mask: tf.Tensor | None = None decoder_head_mask: tf.Tensor | None = None cross_attn_head_mask: tf.Tensor | None = None encoder_outputs: Optional[Union[Tuple, TFBaseModelOutput]] = None past_key_values: List[tf.Tensor] | None = None inputs_embeds: tf.Tensor | None = None decoder_inputs_embeds: tf.Tensor | None = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: Optional[bool] = False **kwargs ) → transformers.modeling_tf_outputs.TFSeq2SeqModelOutput 或 tuple(tf.Tensor)

参数

input_ids (tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (tf.Tensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

BlenderbotSmall 使用 bos_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (tf.Tensor of shape (batch_size, target_sequence_length), optional) — 默认情况下会生成并忽略填充标记。不建议在大多数用例中设置此选项。
decoder_position_ids (tf.Tensor of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
head_mask (tf.Tensor 形状为 (encoder_layers, encoder_attention_heads), 可选) — 用于在编码器中屏蔽注意力模块中选定的头。在 [0, 1] 中选择的掩码值：
- 1 表示头 未被屏蔽,
- 0 表示头 被屏蔽.
decoder_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的特定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (tf.Tensor 形状为 (decoder_layers, decoder_attention_heads), 可选) — 用于屏蔽交叉注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tf.FloatTensor, optional) — 编码器最后一层输出的隐藏状态。用于解码器的交叉注意力。形状为 (batch_size, sequence_length, hidden_size) 的序列
past_key_values (Tuple[Tuple[tf.Tensor]] 长度为 config.n_layers) — 包含预计算的关键和值隐藏状态的注意力块。可用于加速解码。如果使用了 past_key_values，用户可以选择仅输入形状为 (batch_size, 1) 的最后一个 decoder_input_ids（那些没有将其过去的关键值状态提供给此模型的），而不是所有形状为 (batch_size, sequence_length) 的 decoder_input_ids。
use_cache (bool, 可选, 默认为 True) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。在训练期间设置为 False，在生成期间设置为 True
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估时具有不同的行为）。

返回

transformers.modeling_tf_outputs.TFSeq2SeqModelOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFSeq2SeqModelOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时）包含各种元素，具体取决于配置 (BlenderbotSmallConfig) 和输入。

last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (List[tf.Tensor], 可选, 当 use_cache=True 被传递或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tf.Tensor 列表，每个张量的形状为 (2, batch_size, num_heads, sequence_length, embed_size_per_head))。

包含解码器的预计算隐藏状态（注意力块中的键和值），可以用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(tf.Tensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

解码器每层输出的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，经过注意力 softmax 后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(tf.Tensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

编码器每层输出的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。

TFBlenderbotSmallModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFBlenderbotSmallModel
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")
>>> model = TFBlenderbotSmallModel.from_pretrained("facebook/blenderbot_small-90M")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs.last_hidden_state

TFBlenderbotSmallForConditionalGeneration

类 transformers.TFBlenderbotSmallForConditionalGeneration

( config *inputs **kwargs )

参数

config (BlenderbotSmallConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

带有语言建模头的BLENDERBOT_SMALL模型。可用于摘要生成。该模型继承自TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

支持第二种格式的原因是，Keras 方法在将输入传递给模型和层时更喜欢这种格式。由于这种支持，当使用像 model.fit() 这样的方法时，事情应该“正常工作”——只需以 model.fit() 支持的任何格式传递你的输入和标签！然而，如果你想在 Keras 方法之外使用第二种格式，比如在使用 Keras Functional API 创建自己的层或模型时，有三种方法可以用来将所有输入张量收集到第一个位置参数中：

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

( input_ids: tf.Tensor | None = None attention_mask: tf.Tensor | None = None decoder_input_ids: tf.Tensor | None = None decoder_attention_mask: tf.Tensor | None = None decoder_position_ids: tf.Tensor | None = None head_mask: tf.Tensor | None = None decoder_head_mask: tf.Tensor | None = None cross_attn_head_mask: tf.Tensor | None = None encoder_outputs: Optional[TFBaseModelOutput] = None past_key_values: List[tf.Tensor] | None = None inputs_embeds: tf.Tensor | None = None decoder_inputs_embeds: tf.Tensor | None = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或 tuple(tf.Tensor)

参数

input_ids (tf.Tensor of shape ({0})) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (tf.Tensor of shape ({0}), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (tf.Tensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

BlenderbotSmall 使用 bos_token_id 作为 decoder_input_ids 生成的起始标记。如果使用了 past_key_values，则可以选择只输入最后一个 decoder_input_ids（参见 past_key_values）。
decoder_attention_mask (tf.Tensor of shape (batch_size, target_sequence_length), optional) — 默认情况下会生成并忽略填充标记。不建议在大多数用例中设置此选项。
decoder_position_ids (tf.Tensor of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
head_mask (tf.Tensor 形状为 (encoder_layers, encoder_attention_heads), 可选) — 用于在编码器中屏蔽注意力模块中选定的头。在 [0, 1] 中选择的掩码值：
- 1 表示头 未被屏蔽,
- 0 表示头 被屏蔽.
decoder_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于在解码器中取消选择注意力模块的头部。在 [0, 1] 中选择的掩码值：
- 1 表示头部 未被掩码,
- 0 表示头部 被掩码.
cross_attn_head_mask (tf.Tensor of shape (decoder_layers, decoder_attention_heads), optional) — 用于屏蔽交叉注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
encoder_outputs (tf.FloatTensor, 可选) — 编码器最后一层的输出的隐藏状态。用于解码器的交叉注意力。形状为 (batch_size, sequence_length, hidden_size) 是一个序列
past_key_values (Tuple[Tuple[tf.Tensor]] 长度为 config.n_layers) — 包含预计算的注意力块中的键和值隐藏状态。可用于加速解码。如果使用了 past_key_values，用户可以选择仅输入形状为 (batch_size, 1) 的最后一个 decoder_input_ids（那些没有将其过去的键值状态提供给此模型的），而不是所有形状为 (batch_size, sequence_length) 的 decoder_input_ids。
use_cache (bool, 可选, 默认为 True) — 如果设置为 True，past_key_values 键值状态将被返回，并可用于加速解码（参见 past_key_values）。在训练期间设置为 False，在生成期间设置为 True
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的 hidden_states。此参数只能在急切模式下使用，在图形模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估之间有不同的行为）。
labels (tf.tensor of shape (batch_size, sequence_length), optional) — 用于计算掩码语言建模损失的标签。索引应在 [0, ..., config.vocab_size] 或 -100 之间（参见 input_ids 文档字符串）。索引设置为 -100 的标记将被忽略（掩码），损失仅针对标签在 [0, ..., config.vocab_size] 之间的标记进行计算。

返回

transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFSeq2SeqLMOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时），包含根据配置 (BlenderbotSmallConfig) 和输入的各种元素。

loss (tf.Tensor 形状为 (n,), 可选, 其中 n 是非掩码标签的数量，当提供 labels 时返回) — 语言建模损失。
logits (tf.Tensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values (List[tf.Tensor], 可选, 当传递 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tf.Tensor 列表，每个张量的形状为 (2, batch_size, num_heads, sequence_length, embed_size_per_head))。

包含解码器的预计算隐藏状态（注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

TFBlenderbotSmallForConditionalGeneration 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

对话示例：

>>> from transformers import AutoTokenizer, TFBlenderbotSmallForConditionalGeneration

>>> mname = "facebook/blenderbot_small-90M"
>>> model = BlenderbotSmallForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = AutoTokenizer.from_pretrained(mname)

>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> print("Human: ", UTTERANCE)
>>> inputs = tokenizer([UTTERANCE], return_tensors="tf")

>>> reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0])
what kind of carbs do they eat? i don't know much about carbs.

>>> REPLY = "I'm not sure"
>>> print("Human: ", REPLY)
>>> NEXT_UTTERANCE = (
...     "My friends are cool but they eat too many carbs.</s> "
...     "<s>what kind of carbs do they eat? i don't know much about carbs.</s> "
...     "<s>I'm not sure."
... )

>>> inputs = tokenizer([NEXT_UTTERANCE], return_tensors="tf")
>>> inputs.pop("token_type_ids")
>>> next_reply_ids = model.generate(**inputs)
>>> print("Bot: ", tokenizer.batch_decode(next_reply_ids, skip_special_tokens=True)[0])

JAX

Hide JAX content

FlaxBlenderbotSmallModel

类 transformers.FlaxBlenderbotSmallModel

( config: BlenderbotSmallConfig input_shape: typing.Tuple[int] = (1, 1) seed: int = 0 dtype: dtype = _do_init: bool = True **kwargs )

参数

config (BlenderbotSmallConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。
dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).
这可以用于在GPU或TPU上启用混合精度训练或半精度推理。如果指定，所有计算将使用给定的dtype执行。

请注意，这仅指定了计算的数据类型，并不影响模型参数的数据类型。

如果您希望更改模型参数的dtype，请参阅to_fp16()和 to_bf16().

裸的BlenderbotSmall模型转换器，输出原始隐藏状态，没有任何特定的头部。该模型继承自FlaxPreTrainedModel。请查看超类文档，了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头部等）。

该模型也是一个Flax Linen flax.nn.Module 子类。将其作为常规的Flax模块使用，并参考Flax文档以获取与一般用法和行为相关的所有信息。

最后，该模型支持JAX的固有特性，例如：

call

( input_ids: 数组 attention_mask: 可选的[jax.Array] = 无 decoder_input_ids: 可选的[jax.Array] = 无 decoder_attention_mask: 可选的[jax.Array] = 无 position_ids: 可选的[jax.Array] = 无 decoder_position_ids: 可选的[jax.Array] = 无 output_attentions: 可选的[bool] = 无 output_hidden_states: 可选的[bool] = 无 return_dict: 可选的[bool] = 无 train: bool = 假 params: 字典 = 无 dropout_rng: = 无 ) → transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或 tuple(torch.FloatTensor)

返回

transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（BlenderbotSmallConfig）和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入输出，一个用于每层的输出）。

解码器每层输出的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器交叉注意力层的注意力权重，经过注意力 softmax 后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入输出，一个用于每层的输出）。

编码器每层输出的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

编码器的注意力权重，经过注意力 softmax 后，用于计算自注意力头中的加权平均值。

示例：

>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallModel

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")
>>> model = FlaxBlenderbotSmallModel.from_pretrained("facebook/blenderbot_small-90M")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

编码

( input_ids: 数组 attention_mask: 可选[jax.Array] = 无 position_ids: 可选[jax.Array] = 无 output_attentions: 可选[布尔] = 无 output_hidden_states: 可选[布尔] = 无 return_dict: 可选[布尔] = 无 train: 布尔 = 假 params: 字典 = 无 dropout_rng: <函数 PRNGKey 在 0x7f50727b7640> = 无 ) → transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的 hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

返回

transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutput 或一个包含各种元素的 torch.FloatTensor 元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），具体取决于配置 () 和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入层的输出，一个用于每一层的输出）。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每一层一个）。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

示例：

>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

解码

( decoder_input_ids encoder_outputs encoder_attention_mask: typing.Optional[jax.Array] = None decoder_attention_mask: typing.Optional[jax.Array] = None decoder_position_ids: typing.Optional[jax.Array] = None past_key_values: dict = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None train: bool = False params: dict = None dropout_rng: = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或 tuple(torch.FloatTensor)

参数

decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length)) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

对于翻译和摘要训练，应提供decoder_input_ids。如果没有提供decoder_input_ids，模型将根据论文中的去噪预训练方法，通过将input_ids向右移动来创建此张量。
encoder_outputs (tuple(tuple(jnp.ndarray)) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力中。
encoder_attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围为 [0, config.max_position_embeddings - 1].
past_key_values (Dict[str, np.ndarray], 可选, 由 init_cache 返回或传递先前的 past_key_values) — 预计算的隐藏状态字典（注意力块中的键和值），可用于快速自回归解码。预计算的键和值隐藏状态的形状为 [batch_size, max_length].
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

返回

transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置 () 和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。

如果使用了 past_key_values，则只输出形状为 (batch_size, 1, hidden_size) 的序列的最后一个隐藏状态。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量，并且如果 config.is_encoder_decoder=True，则还包含 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预先计算的隐藏状态（自注意力块中的键和值，并且如果 config.is_encoder_decoder=True，则还包含交叉注意力块中的键和值），这些隐藏状态可以用于（参见 past_key_values 输入）加速顺序解码。
hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 jnp.ndarray 组成的元组（一个用于嵌入的输出，一个用于每一层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 和 config.add_cross_attention=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

解码器的交叉注意力层的注意力权重，在注意力 softmax 后，用于计算交叉注意力头中的加权平均值。

示例：

>>> import jax.numpy as jnp
>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id

>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> last_decoder_hidden_states = outputs.last_hidden_state

FlaxBlenderbotForConditionalGeneration

类 transformers.FlaxBlenderbotSmallForConditionalGeneration

( config: BlenderbotSmallConfig input_shape: typing.Tuple[int] = (1, 1) seed: int = 0 dtype: dtype = _do_init: bool = True **kwargs )

参数

config (BlenderbotSmallConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。
dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).
这可以用于在GPU或TPU上启用混合精度训练或半精度推理。如果指定，所有计算将使用给定的dtype执行。

请注意，这仅指定了计算的数据类型，并不影响模型参数的数据类型。

如果您希望更改模型参数的dtype，请参阅to_fp16()和 to_bf16().

带有语言建模头的BLENDERBOT_SMALL模型。可用于摘要生成。该模型继承自FlaxPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入大小、修剪头等）。

该模型也是一个Flax Linen flax.nn.Module 子类。将其作为常规的Flax模块使用，并参考Flax文档以获取与一般用法和行为相关的所有信息。

最后，该模型支持JAX的固有特性，例如：

call

( input_ids: 数组 attention_mask: 可选的[jax.Array] = 无 decoder_input_ids: 可选的[jax.Array] = 无 decoder_attention_mask: 可选的[jax.Array] = 无 position_ids: 可选的[jax.Array] = 无 decoder_position_ids: 可选的[jax.Array] = 无 output_attentions: 可选的[bool] = 无 output_hidden_states: 可选的[bool] = 无 return_dict: 可选的[bool] = 无 train: bool = 假 params: 字典 = 无 dropout_rng: = 无 ) → transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

对于翻译和摘要训练，应提供decoder_input_ids。如果没有提供decoder_input_ids，模型将根据论文中的去噪预训练方法，通过将input_ids向右移动来创建此张量。
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
position_ids (numpy.ndarray 形状为 (batch_size, sequence_length), 可选) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1].
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

返回

transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（BlenderbotSmallConfig）和输入。

logits (jnp.ndarray 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递 use_cache=True 或当 config.use_cache=True 时返回) — 长度为 config.n_layers 的 tuple(jnp.ndarray) 元组，每个元组包含 2 个形状为 (batch_size, num_heads, sequence_length, embed_size_per_head) 的张量和 2 个形状为 (batch_size, num_heads, encoder_sequence_length, embed_size_per_head) 的额外张量。

包含预计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可用于（参见 past_key_values 输入）加速顺序解码。
decoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

解码器在每层输出处的隐藏状态加上初始嵌入输出。
decoder_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

解码器的交叉注意力层的注意力权重，在注意力 softmax 之后，用于计算交叉注意力头中的加权平均值。
encoder_last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size), 可选) — 模型编码器最后一层输出的隐藏状态序列。
encoder_hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入的输出 + 一个用于每层的输出）。

编码器在每层输出处的隐藏状态加上初始嵌入输出。
encoder_attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每层一个）。

编码器的注意力权重，在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

The FlaxBlenderbotSmallPreTrainedModel 前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

摘要示例：

>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> ARTICLE_TO_SUMMARIZE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors="np")

>>> # Generate Summary
>>> summary_ids = model.generate(inputs["input_ids"]).sequences
>>> print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False))

掩码填充示例：

>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")
>>> TXT = "My friends are <mask> but they eat too many carbs."

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> input_ids = tokenizer([TXT], return_tensors="np")["input_ids"]
>>> logits = model(input_ids).logits

>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
>>> probs = jax.nn.softmax(logits[0, masked_index], axis=0)
>>> values, predictions = jax.lax.top_k(probs)

>>> tokenizer.decode(predictions).split()

编码

( input_ids: 数组 attention_mask: 可选[jax.Array] = 无 position_ids: 可选[jax.Array] = 无 output_attentions: 可选[布尔] = 无 output_hidden_states: 可选[布尔] = 无 return_dict: 可选[布尔] = 无 train: 布尔 = 假 params: 字典 = 无 dropout_rng: <函数 PRNGKey 在 0x7f50727b7640> = 无 ) → transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (jnp.ndarray of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
position_ids (numpy.ndarray of shape (batch_size, sequence_length), 可选) — 每个输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

返回

transformers.modeling_flax_outputs.FlaxBaseModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxBaseModelOutput 或一个包含各种元素的 torch.FloatTensor 元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），具体取决于配置 () 和输入。

last_hidden_state (jnp.ndarray 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(jnp.ndarray), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 形状为 (batch_size, sequence_length, hidden_size) 的 jnp.ndarray 元组（一个用于嵌入层的输出，一个用于每一层的输出）。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 形状为 (batch_size, num_heads, sequence_length, sequence_length) 的 jnp.ndarray 元组（每一层一个）。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

示例：

>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

解码

( decoder_input_ids encoder_outputs encoder_attention_mask: typing.Optional[jax.Array] = None decoder_attention_mask: typing.Optional[jax.Array] = None decoder_position_ids: typing.Optional[jax.Array] = None past_key_values: dict = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None deterministic: bool = True params: dict = None dropout_rng: = None ) → transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

参数

decoder_input_ids (jnp.ndarray of shape (batch_size, target_sequence_length)) — Indices of decoder input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是解码器输入ID？

对于翻译和摘要训练，应提供decoder_input_ids。如果没有提供decoder_input_ids，模型将根据论文中的去噪预训练方法，通过将input_ids向右移动来创建此张量。
encoder_outputs (tuple(tuple(jnp.ndarray)) — 元组由 (last_hidden_state, 可选: hidden_states, 可选: attentions) 组成 last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size), 可选) 是编码器最后一层输出的隐藏状态序列。用于解码器的交叉注意力中。
encoder_attention_mask (jnp.ndarray of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
decoder_attention_mask (jnp.ndarray of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
如果你想改变填充行为，你应该根据你的需求进行修改。有关默认策略的更多信息，请参见论文中的图1。
decoder_position_ids (numpy.ndarray of shape (batch_size, sequence_length), optional) — 每个解码器输入序列标记在位置嵌入中的位置索引。选择范围在 [0, config.max_position_embeddings - 1] 之间。
past_key_values (Dict[str, np.ndarray], 可选, 由 init_cache 返回或传递先前的 past_key_values) — 预计算的隐藏状态字典（注意力块中的键和值），可用于快速自回归解码。预计算的键和值隐藏状态的形状为 [batch_size, max_length].
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。

返回

transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或 tuple(torch.FloatTensor)

一个 transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（）和输入。

logits (jnp.ndarray 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前每个词汇标记的分数）。
hidden_states (tuple(jnp.ndarray), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 jnp.ndarray 组成的元组（一个用于嵌入的输出 + 一个用于每层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。
cross_attentions (tuple(jnp.ndarray), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 jnp.ndarray 组成的元组（每层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的交叉注意力权重，用于计算交叉注意力头中的加权平均值。
past_key_values (tuple(tuple(jnp.ndarray)), 可选, 当传递了 use_cache=True 或当 config.use_cache=True 时返回) — 由长度为 config.n_layers 的 jnp.ndarray 元组组成的元组，每个元组包含自注意力和交叉注意力层的缓存键、值状态，如果模型用于编码器-解码器设置。仅在 config.is_decoder = True 时相关。

包含预计算的隐藏状态（注意力块中的键和值），可用于（参见 past_key_values 输入）以加速顺序解码。

示例：

>>> import jax.numpy as jnp
>>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration

>>> model = FlaxBlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M")
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, max_length=1024, return_tensors="np")
>>> encoder_outputs = model.encode(**inputs)

>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id

>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits

< > Update on GitHub

←Blenderbot BLOOM→