Transformers 文档

XLM

Transformers

XLM

概述

XLM模型是由Guillaume Lample和Alexis Conneau在跨语言语言模型预训练中提出的。它是一个使用以下目标之一进行预训练的变压器模型：

因果语言建模（CLM）目标（下一个词预测），
一个掩码语言建模（MLM）目标（类似于BERT），或者
一个翻译语言建模（TLM）对象（BERT的MLM扩展到多语言输入）

论文的摘要如下：

最近的研究已经证明了生成预训练在英语自然语言理解中的效率。在这项工作中，我们将这种方法扩展到多种语言，并展示了跨语言预训练的有效性。我们提出了两种学习跨语言模型（XLMs）的方法：一种是无监督的，仅依赖于单语数据；另一种是有监督的，利用并行数据和新颖的跨语言模型目标。我们在跨语言分类、无监督和有监督的机器翻译中取得了最先进的结果。在XNLI上，我们的方法将最先进水平提高了4.9%的准确率。在无监督机器翻译中，我们在WMT’16德语-英语上获得了34.3 BLEU，比之前的最先进水平提高了超过9 BLEU。在有监督机器翻译中，我们在WMT’16罗马尼亚语-英语上获得了38.5 BLEU的新最先进水平，比之前的最佳方法提高了超过4 BLEU。我们的代码和预训练模型将公开提供。

该模型由thomwolf贡献。原始代码可以在这里找到。

使用提示

XLM 有许多不同的检查点，这些检查点是使用不同的目标训练的：CLM、MLM 或 TLM。请确保为您的任务选择正确的目标（例如，MLM 检查点不适合生成）。
XLM 拥有多语言检查点，这些检查点利用了特定的 lang 参数。查看多语言页面以获取更多信息。
一个在多种语言上训练的变压器模型。该模型有三种不同的训练类型，库为所有这些类型提供了检查点：
- Causal language modeling (CLM) which is the traditional autoregressive training (so this model could be in the previous section as well). One of the languages is selected for each training sample, and the model input is a sentence of 256 tokens, that may span over several documents in one of those languages.
- Masked language modeling (MLM) which is like RoBERTa. One of the languages is selected for each training sample, and the model input is a sentence of 256 tokens, that may span over several documents in one of those languages, with dynamic masking of the tokens.
- A combination of MLM and translation language modeling (TLM). This consists of concatenating a sentence in two different languages, with random masking. To predict one of the masked tokens, the model can use both, the surrounding context in language 1 and the context given by language 2.

资源

XLMConfig

类 transformers.XLMConfig

< source >

( vocab_size = 30145 emb_dim = 2048 n_layers = 12 n_heads = 16 dropout = 0.1 attention_dropout = 0.1 gelu_activation = True sinusoidal_embeddings = False causal = False asm = False n_langs = 1 use_lang_emb = True max_position_embeddings = 512 embed_init_std = 0.02209708691207961 layer_norm_eps = 1e-12 init_std = 0.02 bos_index = 0 eos_index = 1 pad_index = 2 unk_index = 3 mask_index = 5 is_encoder = True summary_type = 'first' summary_use_proj = True summary_activation = None summary_proj_to_labels = True summary_first_dropout = 0.1 start_n_top = 5 end_n_top = 5 mask_token_id = 0 lang_id = 0 pad_token_id = 2 bos_token_id = 0 **kwargs )

参数

vocab_size (int, 可选, 默认为 30145) — BERT 模型的词汇表大小。定义了调用 XLMModel 或 TFXLMModel 时传递的 inputs_ids 可以表示的不同标记的数量。
emb_dim (int, optional, 默认为 2048) — 编码器层和池化层的维度。
n_layer (int, optional, 默认为 12) — Transformer 编码器中的隐藏层数。
n_head (int, optional, 默认为 16) — Transformer 编码器中每个注意力层的注意力头数。
dropout (float, optional, defaults to 0.1) — 嵌入层、编码器和池化器中所有全连接层的dropout概率。
attention_dropout (float, optional, defaults to 0.1) — 注意力机制的dropout概率
gelu_activation (bool, optional, defaults to True) — 是否使用 gelu 作为激活函数而不是 relu.
sinusoidal_embeddings (bool, 可选, 默认为 False) — 是否使用正弦位置嵌入而不是绝对位置嵌入。
因果 (bool, 可选, 默认为 False) — 模型是否应以因果方式运行。因果模型使用三角注意力掩码，以便仅关注左侧上下文，而不是双向上下文。
asm (bool, 可选, 默认为 False) — 是否使用自适应对数softmax投影层而不是线性层作为预测层。
n_langs (int, optional, defaults to 1) — 模型处理的语言数量。对于单语模型，设置为1。
use_lang_emb (bool, 可选, 默认为 True) — 是否使用语言嵌入。一些模型使用额外的语言嵌入，有关如何使用它们的信息，请参阅多语言模型页面.
max_position_embeddings (int, optional, 默认为 512) — 此模型可能使用的最大序列长度。通常将其设置为较大的值以防万一（例如，512、1024 或 2048）。
embed_init_std (float, optional, 默认为 2048^-0.5) — 用于初始化嵌入矩阵的 truncated_normal_initializer 的标准差。
init_std (int, optional, 默认为 50257) — 用于初始化除嵌入矩阵外的所有权重矩阵的 truncated_normal_initializer 的标准差。
layer_norm_eps (float, optional, defaults to 1e-12) — 层归一化层使用的epsilon值。
bos_index (int, optional, defaults to 0) — 词汇表中句子开始标记的索引。
eos_index (int, optional, defaults to 1) — 词汇表中句子结束标记的索引。
pad_index (int, optional, defaults to 2) — 词汇表中填充标记的索引。
unk_index (int, optional, defaults to 3) — 词汇表中未知标记的索引。
mask_index (int, optional, defaults to 5) — 词汇表中掩码标记的索引。
is_encoder(bool, 可选, 默认为 True) — 初始化的模型是否应为Vaswani等人所见的变压器编码器或解码器
summary_type (string, optional, defaults to “first”) — Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
必须是以下选项之一：
- "last": Take the last token hidden state (like XLNet).
- "first": Take the first token hidden state (like BERT).
- "mean": Take the mean of all tokens hidden states.
- "cls_index": Supply a Tensor of classification token position (like GPT/GPT-2).
- "attn": Not implemented now, use multi-head attention.
summary_use_proj (bool, optional, defaults to True) — Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
是否在向量提取后添加投影。
summary_activation (str, optional) — Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
传递 "tanh" 作为输出层的 tanh 激活函数，任何其他值将导致没有激活函数。
summary_proj_to_labels (bool, optional, defaults to True) — Used in the sequence classification and multiple choice models.
投影输出是否应该具有config.num_labels或config.hidden_size类别。
summary_first_dropout (float, optional, defaults to 0.1) — Used in the sequence classification and multiple choice models.
在投影和激活后使用的丢弃比率。
start_n_top (int, optional, defaults to 5) — 用于SQuAD评估脚本。
end_n_top (int, optional, 默认为 5) — 用于 SQuAD 评估脚本。
mask_token_id (int, optional, defaults to 0) — 在MLM上下文中生成文本时，用于识别掩码标记的模型无关参数。
lang_id (int, optional, 默认为 1) — 模型使用的语言ID。此参数用于生成给定语言的文本。

这是用于存储XLMModel或TFXLMModel配置的配置类。它用于根据指定的参数实例化一个XLM模型，定义模型架构。使用默认值实例化配置将产生类似于FacebookAI/xlm-mlm-en-2048架构的配置。

配置对象继承自PretrainedConfig，可用于控制模型输出。阅读PretrainedConfig的文档以获取更多信息。

示例：

>>> from transformers import XLMConfig, XLMModel

>>> # Initializing a XLM configuration
>>> configuration = XLMConfig()

>>> # Initializing a model (with random weights) from the configuration
>>> model = XLMModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

XLMTokenizer

类 transformers.XLMTokenizer

< source >

( vocab_file merges_file unk_token = '' bos_token = '' sep_token = '' pad_token = '' cls_token = '' mask_token = '' additional_special_tokens = ['', '', '', '', '', '', '', '', '', ''] lang2id = None id2lang = None do_lowercase_and_remove_accent = True **kwargs )

参数

vocab_file (str) — 词汇表文件.
merges_file (str) — 合并文件.
unk_token (str, optional, defaults to "") — 未知标记。不在词汇表中的标记无法转换为ID，而是设置为这个标记。
bos_token (str, optional, defaults to "<s>") — The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

在使用特殊标记构建序列时，这不是用于序列开头的标记。使用的标记是cls_token。
sep_token (str, 可选, 默认为 "") — 分隔符标记，用于从多个序列构建序列时，例如用于序列分类的两个序列或用于问答的文本和问题。它也用作使用特殊标记构建的序列的最后一个标记。
pad_token (str, optional, defaults to "") — 用于填充的标记，例如在对不同长度的序列进行批处理时使用。
cls_token (str, 可选, 默认为 "") — 用于序列分类的分类器标记（对整个序列进行分类而不是对每个标记进行分类）。当使用特殊标记构建时，它是序列的第一个标记。
mask_token (str, 可选, 默认为 "") — 用于屏蔽值的标记。这是在训练此模型时使用的标记，用于屏蔽语言建模。这是模型将尝试预测的标记。
additional_special_tokens (List[str], 可选, 默认为 ['', '', '', '', '', '', '', '', '', '']) — 额外的特殊标记列表。
lang2id (Dict[str, int], optional) — 将语言字符串标识符映射到其ID的字典。
id2lang (Dict[int, str], 可选) — 将语言ID映射到其字符串标识符的字典。
do_lowercase_and_remove_accent (bool, optional, defaults to True) — 是否在分词时进行小写转换并去除重音符号。

构建一个XLM分词器。基于字节对编码。分词过程如下：

Moses预处理和大多数支持语言的标记化。
针对中文（Jieba）、日文（KyTea）和泰文（PyThaiNLP）的特定语言分词。
可选地将所有输入文本转换为小写并规范化。
参数 special_tokens 和函数 set_special_tokens 可以用来向词汇表中添加额外的符号（例如 ”classify”）。
lang2id 属性将模型支持的语言与其ID进行映射（如果提供的话，对于预训练词汇表会自动设置）。
如果提供了id2lang属性，它会进行反向映射（对于预训练词汇表会自动设置）。

此分词器继承自PreTrainedTokenizer，其中包含了大部分主要方法。用户应参考此超类以获取有关这些方法的更多信息。

build_inputs_with_special_tokens

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

参数

token_ids_0 (List[int]) — 特殊令牌将被添加到的ID列表。
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。

List[int]

带有适当特殊标记的输入ID列表。

通过连接和添加特殊标记，从序列或序列对构建序列分类任务的模型输入。一个XLM序列具有以下格式：

单一序列： X
序列对： A B

get_special_tokens_mask

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None already_has_special_tokens: bool = False ) → List[int]

参数

token_ids_0 (List[int]) — ID列表.
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。
already_has_special_tokens (bool, optional, defaults to False) — 令牌列表是否已经用模型的特殊令牌格式化。

List[int]

一个整数列表，范围在[0, 1]：1表示特殊标记，0表示序列标记。

从没有添加特殊标记的标记列表中检索序列ID。当使用标记器的prepare_for_model方法添加特殊标记时，会调用此方法。

create_token_type_ids_from_sequences

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

参数

token_ids_0 (List[int]) — ID列表.
token_ids_1 (List[int], optional) — 可选的第二个序列对的ID列表。

List[int]

根据给定序列的token type IDs列表。

从传递给序列对分类任务的两个序列中创建一个掩码。一个XLM序列

pair mask 的格式如下：

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

如果 token_ids_1 是 None，此方法仅返回掩码的第一部分（0s）。

保存词汇表

< source >

( 保存目录: str 文件名前缀: typing.Optional[str] = None )

XLM 特定输出

类 transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringOutput

< source >

( loss: typing.Optional[torch.FloatTensor] = None start_top_log_probs: typing.Optional[torch.FloatTensor] = None start_top_index: typing.Optional[torch.LongTensor] = None end_top_log_probs: typing.Optional[torch.FloatTensor] = None end_top_index: typing.Optional[torch.LongTensor] = None cls_logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor, ...]] = None )

参数

loss (torch.FloatTensor of shape (1,), optional, 如果提供了 start_positions 和 end_positions 则返回) — 分类损失，作为起始标记、结束标记（以及如果提供了 is_impossible）分类损失的总和。
start_top_log_probs (torch.FloatTensor of shape (batch_size, config.start_n_top), optional, returned if start_positions or end_positions is not provided) — 用于前config.start_n_top个开始标记可能性的对数概率（beam-search）。
start_top_index (torch.LongTensor of shape (batch_size, config.start_n_top), optional, 如果未提供 start_positions 或 end_positions 则返回) — 用于前 config.start_n_top 个开始标记可能性的索引（beam-search）。
end_top_log_probs (torch.FloatTensor 形状为 (batch_size, config.start_n_top * config.end_n_top), 可选, 如果未提供 start_positions 或 end_positions 时返回) — 前 config.start_n_top * config.end_n_top 个结束标记可能性的对数概率 (beam-search).
end_top_index (torch.LongTensor 形状为 (batch_size, config.start_n_top * config.end_n_top), 可选, 如果未提供 start_positions 或 end_positions 时返回) — 表示前 config.start_n_top * config.end_n_top 个结束标记可能性的索引（beam-search）。
cls_logits (torch.FloatTensor of shape (batch_size,), optional, 如果未提供 start_positions 或 end_positions 则返回) — 答案的 is_impossible 标签的对数概率。
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
注意力权重在注意力softmax之后，用于计算自注意力头中的加权平均值。

使用SquadHead的问答模型输出的基类。

Pytorch

Hide Pytorch content

XLMModel

类 transformers.XLMModel

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

裸XLM模型转换器输出原始隐藏状态，顶部没有任何特定的头部。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.BaseModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor of shape (batch_size,), optional) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用attention_mask来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在[0, ..., input_ids.size(-1)]范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制权，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, optional) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。

transformers.modeling_outputs.BaseModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.BaseModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

last_hidden_state (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(torch.FloatTensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力softmax后的注意力权重，用于计算自注意力头中的加权平均值。

XLMModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMModel
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> last_hidden_states = outputs.last_hidden_state

XLMWithLMHeadModel

类 transformers.XLMWithLMHeadModel

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型转换器，顶部带有语言建模头（线性层，其权重与输入嵌入绑定）。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MaskedLMOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor of shape (batch_size,), optional) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用attention_mask来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在[0, ..., input_ids.size(-1)]范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制权，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于语言建模的标签。请注意，标签在模型内部被移位，即你可以设置 labels = input_ids 索引在 [-100, 0, ..., config.vocab_size] 中选择。所有设置为 -100 的标签将被忽略（掩码），损失仅针对 [0, ..., config.vocab_size] 中的标签计算

transformers.modeling_outputs.MaskedLMOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.MaskedLMOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 掩码语言建模（MLM）损失。
logits (torch.FloatTensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前每个词汇标记的分数）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

XLMWithLMHeadModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMWithLMHeadModel
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("The capital of France is <special1>.", return_tensors="pt")

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> # retrieve index of <special1>
>>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

>>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)

>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> # mask labels of non-<special1> tokens
>>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

>>> outputs = model(**inputs, labels=labels)

XLMForSequenceClassification

类 transformers.XLMForSequenceClassification

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM 模型，顶部带有序列分类/回归头（在池化输出之上的线性层），例如用于 GLUE 任务。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.SequenceClassifierOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制权，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, optional) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size,), optional) — 用于计算序列分类/回归损失的标签。索引应在 [0, ..., config.num_labels - 1] 范围内。如果 config.num_labels == 1，则计算回归损失（均方损失），如果 config.num_labels > 1，则计算分类损失（交叉熵）。

transformers.modeling_outputs.SequenceClassifierOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.SequenceClassifierOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 分类（或回归，如果 config.num_labels==1）损失。
logits (torch.FloatTensor 形状为 (batch_size, config.num_labels)) — 分类（或回归，如果 config.num_labels==1）分数（在 SoftMax 之前）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

XLMForSequenceClassification 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

单标签分类示例：

>>> import torch
>>> from transformers import AutoTokenizer, XLMForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForSequenceClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_class_id = logits.argmax().item()

>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = XLMForSequenceClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048", num_labels=num_labels)

>>> labels = torch.tensor([1])
>>> loss = model(**inputs, labels=labels).loss

多标签分类示例：

>>> import torch
>>> from transformers import AutoTokenizer, XLMForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForSequenceClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048", problem_type="multi_label_classification")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]

>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = XLMForSequenceClassification.from_pretrained(
...     "FacebookAI/xlm-mlm-en-2048", num_labels=num_labels, problem_type="multi_label_classification"
... )

>>> labels = torch.sum(
...     torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1
... ).to(torch.float)
>>> loss = model(**inputs, labels=labels).loss

XLMForMultipleChoice

类 transformers.XLMForMultipleChoice

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，仅加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM 模型，顶部带有多项选择分类头（在池化输出顶部有一个线性层和一个 softmax），例如用于 RocStories/SWAG 任务。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MultipleChoiceModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, num_choices, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, num_choices, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, num_choices, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, num_choices, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, num_choices, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor of shape (batch_size,), optional) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用attention_mask来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在[0, ..., input_ids.size(-1)]范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor of shape (batch_size, num_choices, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制权，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size,), optional) — 用于计算多项选择分类损失的标签。索引应在 [0, ..., num_choices-1] 范围内，其中 num_choices 是输入张量第二维的大小。（参见上面的 input_ids）

transformers.modeling_outputs.MultipleChoiceModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.MultipleChoiceModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

loss（形状为 (1,) 的 torch.FloatTensor，可选，当提供 labels 时返回）— 分类损失。
logits（形状为 (batch_size, num_choices) 的 torch.FloatTensor）— num_choices 是输入张量的第二维度。（见上面的 input_ids）。

分类分数（在 SoftMax 之前）。
hidden_states（tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回）— 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions（tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回）— 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

XLMForMultipleChoice 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMForMultipleChoice
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForMultipleChoice.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."
>>> labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;)), batch size 1

>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True)
>>> outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> loss = outputs.loss
>>> logits = outputs.logits

XLMForTokenClassification

类 transformers.XLMForTokenClassification

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型，顶部带有标记分类头（在隐藏状态输出之上的线性层），例如用于命名实体识别（NER）任务。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.TokenClassifierOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制权，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — 用于计算标记分类损失的标签。索引应在 [0, ..., config.num_labels - 1] 范围内。

transformers.modeling_outputs.TokenClassifierOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.TokenClassifierOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

loss (torch.FloatTensor 形状为 (1,), 可选, 当提供 labels 时返回) — 分类损失。
logits (torch.FloatTensor 形状为 (batch_size, sequence_length, config.num_labels)) — 分类分数（在 SoftMax 之前）。
hidden_states (tuple(torch.FloatTensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

XLMForTokenClassification 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMForTokenClassification
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForTokenClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer(
...     "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt"
... )

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_token_class_ids = logits.argmax(-1)

>>> # Note that tokens are classified rather then input words which means that
>>> # there might be more predicted token classes than words.
>>> # Multiple token classes might account for the same word
>>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]

>>> labels = predicted_token_class_ids
>>> loss = model(**inputs, labels=labels).loss

XLMForQuestionAnsweringSimple

类 transformers.XLMForQuestionAnsweringSimple

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型，顶部带有用于抽取式问答任务（如SQuAD）的跨度分类头（在隐藏状态输出之上的线性层，用于计算span start logits和span end logits）。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None start_positions: typing.Optional[torch.Tensor] = None end_positions: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.QuestionAnsweringModelOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor of shape (batch_size,), optional) — 每个句子的长度，可用于避免在填充标记索引上执行注意力机制。你也可以使用attention_mask来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在[0, ..., input_ids.size(-1)]范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
start_positions (torch.LongTensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度起始位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会被考虑用于计算损失。
end_positions (torch.LongTensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度结束位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会被考虑用于计算损失。

transformers.modeling_outputs.QuestionAnsweringModelOutput 或 tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.QuestionAnsweringModelOutput 或一个由 torch.FloatTensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含各种元素，具体取决于配置（XLMConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，当提供 labels 时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。
start_logits (torch.FloatTensor 形状为 (batch_size, sequence_length)) — 跨度起始分数（在 SoftMax 之前）。
end_logits (torch.FloatTensor 形状为 (batch_size, sequence_length)) — 跨度结束分数（在 SoftMax 之前）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 torch.FloatTensor 组成的元组（一个用于嵌入层的输出，如果模型有嵌入层，+ 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 torch.FloatTensor 组成的元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

XLMForQuestionAnsweringSimple 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMForQuestionAnsweringSimple
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForQuestionAnsweringSimple.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> with torch.no_grad():
...     outputs = model(**inputs)

>>> answer_start_index = outputs.start_logits.argmax()
>>> answer_end_index = outputs.end_logits.argmax()

>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]

>>> # target is "nice puppet"
>>> target_start_index = torch.tensor([14])
>>> target_end_index = torch.tensor([15])

>>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
>>> loss = outputs.loss

XLMForQuestionAnswering

类 transformers.XLMForQuestionAnswering

< source >

( config )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型，顶部带有波束搜索跨度分类头，用于抽取式问答任务，如SQuAD（在隐藏状态输出之上使用线性层来计算span start logits和span end logits）。

该模型继承自PreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个PyTorch torch.nn.Module 子类。将其作为常规的PyTorch模块使用，并参考PyTorch文档以获取与一般使用和行为相关的所有信息。

前进

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None langs: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None lengths: typing.Optional[torch.Tensor] = None cache: typing.Optional[typing.Dict[str, torch.Tensor]] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None start_positions: typing.Optional[torch.Tensor] = None end_positions: typing.Optional[torch.Tensor] = None is_impossible: typing.Optional[torch.Tensor] = None cls_index: typing.Optional[torch.Tensor] = None p_mask: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringOutput 或 tuple(torch.FloatTensor)

参数

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.encode()和 PreTrainedTokenizer.call()。

什么是输入ID？
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (torch.LongTensor of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (torch.LongTensor of shape (batch_size,), optional) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用attention_mask来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在[0, ..., input_ids.size(-1)]范围内。
cache (Dict[str, torch.FloatTensor], optional) — Dictionary string to torch.FloatTensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (torch.FloatTensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (torch.FloatTensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量下的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
start_positions (torch.LongTensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度起始位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会用于计算损失。
end_positions (torch.LongTensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度结束位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会被考虑用于计算损失。
is_impossible (torch.LongTensor of shape (batch_size,), optional) — 标签指示一个问题是否有答案或无答案（SQuAD 2.0）
cls_index (torch.LongTensor of shape (batch_size,), optional) — 用于计算答案合理性的分类标记的位置（索引）标签。
p_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — 可选的标记掩码，这些标记不能出现在答案中（例如 [CLS], [PAD], …）。1.0 表示标记应被掩码。0.0 表示标记未被掩码。

transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringOutput 或 tuple(torch.FloatTensor)

一个 transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringOutput 或一个包含各种元素的元组 torch.FloatTensor（如果传递了 return_dict=False 或当 config.return_dict=False 时），具体取决于配置（XLMConfig）和输入。

loss (torch.FloatTensor 形状为 (1,)，可选，如果提供了 start_positions 和 end_positions 则返回) — 分类损失，作为起始标记、结束标记（以及如果提供了 is_impossible）分类损失的总和。
start_top_log_probs (torch.FloatTensor 形状为 (batch_size, config.start_n_top)，可选，如果未提供 start_positions 或 end_positions 则返回) — 前 config.start_n_top 个起始标记可能性的对数概率（beam-search）。
start_top_index (torch.LongTensor 形状为 (batch_size, config.start_n_top)，可选，如果未提供 start_positions 或 end_positions 则返回) — 前 config.start_n_top 个起始标记可能性的索引（beam-search）。
end_top_log_probs (torch.FloatTensor 形状为 (batch_size, config.start_n_top * config.end_n_top)，可选，如果未提供 start_positions 或 end_positions 则返回) — 前 config.start_n_top * config.end_n_top 个结束标记可能性的对数概率（beam-search）。
end_top_index (torch.LongTensor 形状为 (batch_size, config.start_n_top * config.end_n_top)，可选，如果未提供 start_positions 或 end_positions 则返回) — 前 config.start_n_top * config.end_n_top 个结束标记可能性的索引（beam-search）。
cls_logits (torch.FloatTensor 形状为 (batch_size,)，可选，如果未提供 start_positions 或 end_positions 则返回) — 答案的 is_impossible 标签的对数概率。
hidden_states (tuple(torch.FloatTensor)，可选，当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 一个包含 torch.FloatTensor 的元组（一个用于嵌入层的输出，一个用于每一层的输出），形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 一个包含 torch.FloatTensor 的元组（每一层一个），形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力权重在注意力 softmax 之后，用于计算自注意力头中的加权平均值。

XLMForQuestionAnswering 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, XLMForQuestionAnswering
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = XLMForQuestionAnswering.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
...     0
... )  # Batch size 1
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss

TensorFlow

Hide TensorFlow content

TFXLMModel

类 transformers.TFXLMModel

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

裸XLM模型转换器输出原始隐藏状态，顶部没有任何特定的头部。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

支持第二种格式的原因是，Keras 方法在将输入传递给模型和层时更喜欢这种格式。由于这种支持，当使用像 model.fit() 这样的方法时，事情应该“正常工作”——只需以 model.fit() 支持的任何格式传递你的输入和标签！然而，如果你想在 Keras 方法之外使用第二种格式，比如在使用 Keras Functional API 创建自己的层或模型时，有三种方法可以用来将所有输入张量收集到第一个位置参数中：

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力机制。你也可以使用 attention_mask 来达到相同的结果（见上文），此处保留以保持兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部未被屏蔽,
- 0 表示头部被屏蔽.
inputs_embeds (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估时具有不同的行为）。

transformers.modeling_tf_outputs.TFBaseModelOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFBaseModelOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时），包含根据配置 (XLMConfig) 和输入的各种元素。

last_hidden_state (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size)) — 模型最后一层输出的隐藏状态序列。
hidden_states (tuple(tf.FloatTensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入层的输出 + 一个用于每一层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每一层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每一层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMModel
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)

>>> last_hidden_states = outputs.last_hidden_state

TFXLMWithLMHeadModel

类 transformers.TFXLMWithLMHeadModel

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型转换器，顶部带有语言建模头（线性层，其权重与输入嵌入绑定）。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留是为了兼容性。选择的索引在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估之间有不同的行为）。

transformers.models.xlm.modeling_tf_xlm.TFXLMWithLMHeadModelOutput 或 tuple(tf.Tensor)

一个 transformers.models.xlm.modeling_tf_xlm.TFXLMWithLMHeadModelOutput 或一个 tf.Tensor 元组（如果 return_dict=False 被传递或当 config.return_dict=False 时）包含各种元素，具体取决于配置 (XLMConfig) 和输入。

logits (tf.Tensor 形状为 (batch_size, sequence_length, config.vocab_size)) — 语言建模头的预测分数（SoftMax 之前的每个词汇标记的分数）。
hidden_states (tuple(tf.Tensor), 可选, 当 output_hidden_states=True 被传递或当 config.output_hidden_states=True 时返回) — tf.Tensor 元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当 output_attentions=True 被传递或当 config.output_attentions=True 时返回) — tf.Tensor 元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMWithLMHeadModel 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMWithLMHeadModel
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
>>> outputs = model(inputs)
>>> logits = outputs.logits

TFXLMForSequenceClassification

类 transformers.TFXLMForSequenceClassification

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM 模型，顶部带有序列分类/回归头（在池化输出之上的线性层），例如用于 GLUE 任务。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留以保持兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (tf.Tensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（某些模块如dropout模块在训练和评估时具有不同的行为）。
labels (tf.Tensor 形状为 (batch_size,), 可选) — 用于计算序列分类/回归损失的标签。索引应在 [0, ..., config.num_labels - 1] 范围内。如果 config.num_labels == 1，则计算回归损失（均方损失），如果 config.num_labels > 1，则计算分类损失（交叉熵）。

transformers.modeling_tf_outputs.TFSequenceClassifierOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFSequenceClassifierOutput 或一个由 tf.Tensor 组成的元组（如果 return_dict=False 被传递或当 config.return_dict=False 时），包含根据配置 (XLMConfig) 和输入的各种元素。

loss (tf.Tensor 形状为 (batch_size, ), 可选, 当提供 labels 时返回) — 分类（或回归，如果 config.num_labels==1）损失。
logits (tf.Tensor 形状为 (batch_size, config.num_labels)) — 分类（或回归，如果 config.num_labels==1）得分（在 SoftMax 之前）。
hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 tf.Tensor 组成的元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 tf.Tensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMForSequenceClassification 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMForSequenceClassification
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMForSequenceClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")

>>> logits = model(**inputs).logits

>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])

>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = TFXLMForSequenceClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048", num_labels=num_labels)

>>> labels = tf.constant(1)
>>> loss = model(**inputs, labels=labels).loss

TFXLMForMultipleChoice

类 transformers.TFXLMForMultipleChoice

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM 模型，顶部带有多项选择分类头（在池化输出顶部有一个线性层和一个 softmax），例如用于 RocStories/SWAG 任务。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, num_choices, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, num_choices, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留以保持兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (tf.Tensor 形状为 (batch_size, num_choices, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估之间有不同的行为）。

transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput 或一个由 tf.Tensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含根据配置（XLMConfig）和输入的各种元素。

loss (tf.Tensor 形状为 (batch_size, ), 可选, 当提供 labels 时返回) — 分类损失。
logits (tf.Tensor 形状为 (batch_size, num_choices)) — num_choices 是输入张量的第二维度。（见上面的 input_ids）。

分类分数（在 SoftMax 之前）。
hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 tf.Tensor 组成的元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 tf.Tensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMForMultipleChoice 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMForMultipleChoice
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMForMultipleChoice.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife."
>>> choice1 = "It is eaten while held in the hand."

>>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="tf", padding=True)
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()}
>>> outputs = model(inputs)  # batch size is 1

>>> # the linear classifier still needs to be trained
>>> logits = outputs.logits

TFXLMForTokenClassification

类 transformers.TFXLMForTokenClassification

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM模型，顶部带有标记分类头（在隐藏状态输出之上的线性层），例如用于命名实体识别（NER）任务。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留以保持兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (tf.Tensor 形状为 (batch_size, sequence_length, hidden_size), 可选) — 可选地，您可以选择直接传递嵌入表示，而不是传递 input_ids。如果您希望对如何将 input_ids 索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估时具有不同的行为）。
labels (tf.Tensor of shape (batch_size, sequence_length), optional) — 用于计算令牌分类损失的标签。索引应在 [0, ..., config.num_labels - 1] 范围内。

transformers.modeling_tf_outputs.TFTokenClassifierOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFTokenClassifierOutput 或一个 tf.Tensor 的元组（如果 return_dict=False 被传递或当 config.return_dict=False 时）包含各种元素，取决于配置 (XLMConfig) 和输入。

loss (tf.Tensor 形状为 (n,), 可选, 其中 n 是未掩码标签的数量，当提供 labels 时返回) — 分类损失。
logits (tf.Tensor 形状为 (batch_size, sequence_length, config.num_labels)) — 分类分数（在 SoftMax 之前）。
hidden_states (tuple(tf.Tensor), 可选, 当传递 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — tf.Tensor 的元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当传递 output_attentions=True 或当 config.output_attentions=True 时返回) — tf.Tensor 的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMForTokenClassification 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMForTokenClassification
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMForTokenClassification.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> inputs = tokenizer(
...     "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="tf"
... )

>>> logits = model(**inputs).logits
>>> predicted_token_class_ids = tf.math.argmax(logits, axis=-1)

>>> # Note that tokens are classified rather then input words which means that
>>> # there might be more predicted token classes than words.
>>> # Multiple token classes might account for the same word
>>> predicted_tokens_classes = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()]

>>> labels = predicted_token_class_ids
>>> loss = tf.math.reduce_mean(model(**inputs, labels=labels).loss)

TFXLMForQuestionAnsweringSimple

类 transformers.TFXLMForQuestionAnsweringSimple

< source >

( config *inputs **kwargs )

参数

config (XLMConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained() 方法以加载模型权重。

XLM 模型，顶部带有用于抽取式问答任务（如 SQuAD）的跨度分类头（在隐藏状态输出之上的线性层，用于计算 span start logits 和 span end logits）。

该模型继承自 TFPreTrainedModel。请查看超类文档以了解库为其所有模型实现的通用方法（如下载或保存、调整输入嵌入的大小、修剪头部等）。

该模型也是一个keras.Model子类。可以将其作为常规的TF 2.0 Keras模型使用，并参考TF 2.0文档以了解与一般使用和行为相关的所有事项。

TensorFlow 模型和层在 transformers 中接受两种格式作为输入：

将所有输入作为关键字参数（如PyTorch模型），或
将所有输入作为列表、元组或字典放在第一个位置参数中。

仅包含input_ids的单个张量，没有其他内容：model(input_ids)
一个长度不定的列表，包含一个或多个输入张量，按照文档字符串中给出的顺序： model([input_ids, attention_mask]) 或 model([input_ids, attention_mask, token_type_ids])
一个字典，包含一个或多个与文档字符串中给出的输入名称相关联的输入张量： model({"input_ids": input_ids, "token_type_ids": token_type_ids})

请注意，当使用子类化创建模型和层时，您不需要担心这些，因为您可以像传递任何其他Python函数一样传递输入！

调用

< source >

参数

input_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary.
可以使用AutoTokenizer获取索引。详情请参见PreTrainedTokenizer.call()和 PreTrainedTokenizer.encode()。

什么是输入ID？
attention_mask (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
什么是注意力掩码？
langs (tf.Tensor or Numpy array of shape (batch_size, sequence_length), optional) — A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name to language id mapping is in model.config.lang2id (which is a dictionary string to int) and the language id to language name mapping is in model.config.id2lang (dictionary int to string).
请参阅多语言文档中详细的使用示例。
token_type_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
什么是token type IDs?
position_ids (Numpy array or tf.Tensor of shape (batch_size, sequence_length), optional) — Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].
什么是位置ID？
lengths (tf.Tensor 或 Numpy array 形状为 (batch_size,), 可选) — 每个句子的长度，可用于避免在填充标记索引上执行注意力。你也可以使用 attention_mask 来达到相同的结果（见上文），这里保留是为了兼容性。索引选择在 [0, ..., input_ids.size(-1)] 范围内。
cache (Dict[str, tf.Tensor], optional) — Dictionary string to tf.Tensor that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding.
字典对象在前向传递过程中将被就地修改，以添加新计算出的隐藏状态。
head_mask (Numpy array 或 tf.Tensor 形状为 (num_heads,) 或 (num_layers, num_heads), 可选) — 用于屏蔽自注意力模块中选定的头部的掩码。掩码值在 [0, 1] 中选择：
- 1 表示头部 未被屏蔽,
- 0 表示头部 被屏蔽.
inputs_embeds (tf.Tensor of shape (batch_size, sequence_length, hidden_size), optional) — 可选地，您可以选择直接传递嵌入表示，而不是传递input_ids。如果您希望对如何将input_ids索引转换为相关向量有更多控制，而不是使用模型的内部嵌入查找矩阵，这将非常有用。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参见返回张量中的attentions。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参见返回张量下的hidden_states。此参数只能在eager模式下使用，在graph模式下将使用配置中的值。
return_dict (bool, 可选) — 是否返回一个ModelOutput而不是一个普通的元组。此参数可以在eager模式下使用，在graph模式下该值将始终设置为True.
训练 (bool, 可选, 默认为 False) — 是否在训练模式下使用模型（一些模块如dropout模块在训练和评估之间有不同的行为）。
start_positions (tf.Tensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度起始位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会用于计算损失。
end_positions (tf.Tensor of shape (batch_size,), optional) — 用于计算标记分类损失的标记跨度结束位置（索引）的标签。位置被限制在序列长度内（sequence_length）。序列之外的位置不会用于计算损失。

transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput 或 tuple(tf.Tensor)

一个 transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput 或一个由 tf.Tensor 组成的元组（如果传递了 return_dict=False 或当 config.return_dict=False 时），包含根据配置 (XLMConfig) 和输入的各种元素。

loss (tf.Tensor 形状为 (batch_size, ), 可选, 当提供了 start_positions 和 end_positions 时返回) — 总跨度提取损失是起始和结束位置的交叉熵之和。
start_logits (tf.Tensor 形状为 (batch_size, sequence_length)) — 跨度起始分数（在 SoftMax 之前）。
end_logits (tf.Tensor 形状为 (batch_size, sequence_length)) — 跨度结束分数（在 SoftMax 之前）。
hidden_states (tuple(tf.Tensor), 可选, 当传递了 output_hidden_states=True 或当 config.output_hidden_states=True 时返回) — 由 tf.Tensor 组成的元组（一个用于嵌入的输出 + 一个用于每层的输出）形状为 (batch_size, sequence_length, hidden_size)。

模型在每层输出处的隐藏状态加上初始嵌入输出。
attentions (tuple(tf.Tensor), 可选, 当传递了 output_attentions=True 或当 config.output_attentions=True 时返回) — 由 tf.Tensor 组成的元组（每层一个）形状为 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

TFXLMForQuestionAnsweringSimple 的前向方法，重写了 __call__ 特殊方法。

尽管前向传递的配方需要在此函数内定义，但之后应该调用Module实例而不是这个，因为前者负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoTokenizer, TFXLMForQuestionAnsweringSimple
>>> import tensorflow as tf

>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-en-2048")
>>> model = TFXLMForQuestionAnsweringSimple.from_pretrained("FacebookAI/xlm-mlm-en-2048")

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

>>> inputs = tokenizer(question, text, return_tensors="tf")
>>> outputs = model(**inputs)

>>> answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
>>> answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]

>>> # target is "nice puppet"
>>> target_start_index = tf.constant([14])
>>> target_end_index = tf.constant([15])

>>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
>>> loss = tf.math.reduce_mean(outputs.loss)

< > Update on GitHub

←XGLM XLM-ProphetNet→