Transformers 文档

模型

Transformers

模型

基类 PreTrainedModel, TFPreTrainedModel, 和 FlaxPreTrainedModel 实现了从本地文件或目录加载/保存模型的通用方法，或者从库提供的预训练模型配置中加载/保存模型（从 HuggingFace 的 AWS S3 仓库下载）。

PreTrainedModel 和 TFPreTrainedModel 还实现了一些在所有模型中通用的方法，用于：

当新词被添加到词汇表时，调整输入标记嵌入的大小
修剪模型的注意力头。

每个模型共有的其他方法定义在 ModuleUtilsMixin （对于 PyTorch 模型）和 ~modeling_tf_utils.TFModuleUtilsMixin（对于 TensorFlow 模型）中，或者对于文本生成，GenerationMixin（对于 PyTorch 模型）， TFGenerationMixin（对于 TensorFlow 模型）和 FlaxGenerationMixin（对于 Flax/JAX 模型）。

预训练模型

类 transformers.PreTrainedModel

( config: PretrainedConfig *inputs **kwargs )

所有模型的基类。

PreTrainedModel 负责存储模型的配置，并处理加载、下载和保存模型的方法，以及一些所有模型通用的方法：

调整输入嵌入的大小，
在自注意力头中修剪头部。

类属性（由派生类覆盖）：

config_class (PretrainedConfig) — PretrainedConfig 的一个子类，用作此模型架构的配置类。
load_tf_weights (Callable) — 一个用于在PyTorch模型中加载TensorFlow检查点的python 方法，接受以下参数：
- model (PreTrainedModel) — An instance of the model on which to load the TensorFlow checkpoint.
- config (PreTrainedConfig) — An instance of the configuration associated to the model.
- path (str) — A path to the TensorFlow checkpoint.
base_model_prefix (str) — 一个字符串，指示在相同架构的派生类中与基础模型关联的属性，这些派生类在基础模型之上添加了模块。
is_parallelizable (bool) — 一个标志，指示此模型是否支持模型并行化。
main_input_name (str) — 模型的主要输入名称（对于NLP模型通常是input_ids，对于视觉模型是pixel_values，对于语音模型是input_values）。

push_to_hub

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — 您想要推送模型到的仓库名称。当推送到特定组织时，它应包含您的组织名称。
use_temp_dir (bool, optional) — 是否使用临时目录来存储推送到 Hub 之前保存的文件。如果没有名为 repo_id 的目录，则默认为 True，否则为 False。
commit_message (str, optional) — 推送时要提交的消息。默认为 "Upload model".
private (bool, 可选) — 是否将仓库设为私有。如果为None（默认值），仓库将为公开，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "5GB") — 仅适用于模型。分片前检查点的最大大小。分片后的检查点大小将小于此大小。如果以字符串形式表示，需要是数字后跟单位（如 "5MB"）。我们默认将其设置为 "5GB"，以便用户可以在免费层级的Google Colab实例上轻松加载模型，而不会出现CPU内存不足的问题。
create_pr (bool, 可选, 默认为 False) — 是否创建一个带有上传文件的PR或直接提交。
safe_serialization (bool, optional, defaults to True) — 是否将模型权重转换为safetensors格式以实现更安全的序列化。
revision (str, optional) — 将上传的文件推送到的分支.
commit_description (str, optional) — 将要创建的提交的描述
标签 (List[str], 可选) — 推送到Hub的标签列表。

将模型文件上传到🤗模型中心。

示例：

from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")

# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")

add_model_tags

( 标签: typing.Union[typing.List[str], str] )

参数

tags (Union[List[str], str]) — 要注入模型的所需标签

将自定义标签添加到推送到Hugging Face Hub的模型中。不会覆盖模型中现有的标签。

示例：

from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

model.add_model_tags(["custom", "custom-bert"])

# Push the model to your namespace with the name "my-custom-bert".
model.push_to_hub("my-custom-bert")

can_generate

( ) → bool

返回

bool

这个模型是否可以使用.generate()生成序列。

返回此模型是否可以使用.generate()生成序列。

反量化

( )

如果模型已被支持去量化的量化方法量化，则可能对其进行去量化。

disable_input_require_grads

( )

移除 _require_grads_hook。

enable_input_require_grads

( )

启用输入嵌入的梯度。这对于在保持模型权重固定的同时微调适配器权重非常有用。

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] *model_args config: typing.Union[transformers.configuration_utils.PretrainedConfig, str, os.PathLike, NoneType] = None cache_dir: typing.Union[str, os.PathLike, NoneType] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: typing.Union[bool, str, NoneType] = None revision: str = 'main' use_safetensors: typing.Optional[bool] = None weights_only: bool = True **kwargs )

参数

pretrained_model_name_or_path (str or os.PathLike, optional) — Can be either:
- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
- A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- A path or url to a model folder containing a flax checkpoint file in .msgpack format (e.g, ./flax_model/ containing flax_model.msgpack). In this case, from_flax should be set to True.
- None if you are both providing the configuration and state dictionary (resp. with keyword arguments config and state_dict).
model_args (位置参数的序列, 可选) — 所有剩余的位置参数将被传递给底层模型的 __init__ 方法.
config (Union[PretrainedConfig, str, os.PathLike], optional) — Can be either:
- an instance of a class derived from PretrainedConfig,
- a string or path valid as input to from_pretrained().
用于模型的配置，而不是自动加载的配置。配置可以在以下情况下自动加载：
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
state_dict (Dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.
如果你想从预训练配置创建模型但加载自己的权重，可以使用此选项。不过，在这种情况下，你应该检查使用save_pretrained()和from_pretrained()是否不是一个更简单的选择。
cache_dir (Union[str, os.PathLike], 可选) — 如果不应使用标准缓存，则应缓存下载的预训练模型配置的目录路径。
from_tf (bool, 可选, 默认为 False) — 从TensorFlow检查点保存文件加载模型权重（参见pretrained_model_name_or_path参数的文档字符串）。
from_flax (bool, 可选, 默认为 False) — 从Flax检查点保存文件加载模型权重（参见pretrained_model_name_or_path参数的文档字符串）。
ignore_mismatched_sizes (bool, 可选, 默认为 False) — 是否在检查点中的某些权重与模型权重大小不匹配时引发错误（例如，如果您正在从具有3个标签的检查点实例化一个具有10个标签的模型）。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载模型权重和配置文件，覆盖已存在的缓存版本。
resume_download — 已弃用并被忽略。现在默认情况下，所有下载在可能时都会自动恢复。将在Transformers的v5版本中移除。
proxies (Dict[str, str], 可选) — 一个按协议或端点使用的代理服务器字典，例如 {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}。这些代理在每次请求时都会被使用。
output_loading_info(bool, 可选, 默认为 False) — 是否还返回一个包含缺失键、意外键和错误消息的字典。
local_files_only(bool, 可选, 默认为 False) — 是否仅查看本地文件（即不尝试下载模型）。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为True，或未指定，将使用运行huggingface-cli login时生成的令牌（存储在~/.huggingface中）。
revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

要测试你在Hub上提交的拉取请求，你可以传递revision="refs/pr/"。
mirror (str, optional) — 镜像源以加速在中国的下载。如果您来自中国并且遇到访问问题，可以设置此选项来解决。请注意，我们不保证及时性或安全性。请参考镜像站点以获取更多信息。
_fast_init(bool, optional, defaults to True) — Whether or not to disable fast initialization.

只有在确保与transformers.__version__ < 4.6.0的向后兼容性以进行种子模型初始化时，才应禁用_fast_init。此参数将在下一个主要版本中移除。更多信息请参见 pull request 11471。
attn_implementation (str, 可选) — 模型中使用的注意力实现（如果相关）。可以是 "eager"（手动实现的注意力），"sdpa"（使用 F.scaled_dot_product_attention），或 "flash_attention_2"（使用 Dao-AILab/flash-attention）。默认情况下，如果可用，SDPA 将用于 torch>=2.1.1。否则，默认是手动的 "eager" 实现。

大模型推理的参数

low_cpu_mem_usage(bool, 可选) — 尝试在加载模型时不超过CPU内存中模型大小的1倍（包括峰值内存）。通常应与device_map（如"auto"）结合使用以获得最佳效果。这是一个实验性功能，可能会随时更改。如果模型权重与加载的模型精度相同，`low_cpu_mem_usage`（没有`device_map`）是多余的，不会在CPU内存使用方面提供任何好处。然而，如果你传递了一个`device_map`，仍然应该启用此功能。
torch_dtype (str or torch.dtype, optional) — Override the default torch.dtype and load the model under a specific dtype. The different options are:
1. torch.float16 或 torch.bfloat16 或 torch.float：以指定的 dtype 加载，忽略模型的 config.torch_dtype（如果存在）。如果未指定
  - the model will get loaded in torch.float (fp32).
2. "auto" - 将尝试使用模型config.json文件中的torch_dtype条目。如果未找到此条目，则接下来检查检查点中第一个浮点类型权重的dtype，并将其用作dtype。这将使用模型在训练结束时保存的dtype加载模型。它不能用作模型训练方式的指示器，因为模型可能以半精度dtype之一进行训练，但以fp32保存。
3. 一个有效的torch.dtype字符串。例如，“float32”以torch.float32加载模型，“float16”以torch.float16加载模型等。
对于某些模型，它们训练时使用的dtype是未知的——你可以尝试查看模型的论文或联系作者，请他们将此信息添加到模型的卡片中，并在config.json中插入torch_dtype条目。
device_map (str or Dict[str, Union[int, str, torch.device]] or int or torch.device, optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. If we only pass the device (e.g., "cpu", "cuda:1", "mps", or a GPU ordinal rank like 1) on which the model will be allocated, the device map will map the entire model to this device. Passing device_map = 0 means put the whole model on GPU 0.
要让 Accelerate 自动计算最优化的 device_map，请设置 device_map="auto"。有关每个选项的更多信息，请参阅设计设备映射.
max_memory (Dict, optional) — 一个字典设备标识符到最大内存。如果未设置，将默认为每个GPU的最大可用内存和可用的CPU内存。
offload_folder (str 或 os.PathLike, 可选) — 如果 device_map 包含任何值为 "disk"，我们将卸载权重的文件夹。
offload_state_dict (bool, optional) — 如果为 True，将暂时将 CPU 状态字典卸载到硬盘，以避免在 CPU 状态字典的权重加上检查点的最大分片不适合时超出 CPU RAM。默认情况下，当存在磁盘卸载时为 True。
offload_buffers (bool, optional) — 是否卸载带有模型参数的缓冲区。
quantization_config (Union[QuantizationConfigMixin,Dict], 可选) — 一个配置参数字典或一个用于量化的QuantizationConfigMixin对象（例如bitsandbytes, gptq）。可能还有其他与量化相关的kwargs，包括load_in_4bit和load_in_8bit，这些参数由QuantizationConfigParser解析。仅支持bitsandbytes量化，且不推荐使用。考虑将所有此类参数插入quantization_config中。
子文件夹 (str, 可选, 默认为 "") — 如果相关文件位于 huggingface.co 上的模型仓库的子文件夹中，您可以在此处指定文件夹名称。
variant (str, 可选) — 如果指定，从variant文件名加载权重，例如 pytorch_model..bin。当使用from_tf或from_flax时，variant将被忽略。
use_safetensors (bool, 可选, 默认为 None) — 是否使用 safetensors 检查点。默认为 None。如果未指定且未安装 safetensors，则将其设置为 False.
weights_only (bool, 可选, 默认为 True) — 指示反序列化器是否应仅限于加载张量、原始类型、字典以及通过 torch.serialization.add_safe_globals() 添加的任何类型。当设置为 False 时，我们可以加载包装器张量子类权重。
kwargs (remaining dictionary of keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:
- If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)
- If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (from_pretrained()). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

从预训练模型配置中实例化一个预训练的pytorch模型。

模型默认使用model.eval()设置为评估模式（Dropout模块被停用）。要训练模型，您应首先使用model.train()将其设置回训练模式。

警告 Weights from XXX not initialized from pretrained model 意味着 XXX 的权重并未与模型的其余部分一起进行预训练。您需要通过下游的微调任务来训练这些权重。

警告 Weights from XXX not used in YYY 意味着层 XXX 未被 YYY 使用，因此这些权重被丢弃。

如果模型权重与基础模型的精度相同（并且是支持的模型），权重将使用meta设备进行懒加载，并在输入通过该层时加载到内存中，无论low_cpu_mem_usage如何。

激活特殊的“离线模式”以在防火墙环境中使用此方法。

示例：

>>> from transformers import BertConfig, BertModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
>>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
>>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)

low_cpu_mem_usage 算法:

这是一个实验性函数，使用约1倍模型大小的CPU内存加载模型

以下是它的工作原理：

保存我们拥有的state_dict键
在创建模型之前删除state_dict，因为后者会占用1倍模型大小的CPU内存
在模型实例化之后，将所有将要被加载的state_dict替换的参数/缓冲区切换到元设备
第二次加载 state_dict
替换来自 state_dict 的参数/缓冲区

目前，它无法处理 deepspeed ZeRO 阶段 3 并忽略加载错误

get_compiled_call

( compile_config: CompileConfig )

返回一个torch.compile版本的self.__call__。这在推理过程中动态选择非编译/编译的forward非常有用，特别是在预填充（我们不希望使用编译版本来避免重新计算新形状的图）和迭代解码（我们希望使用编译版本以获得静态形状的速度提升）之间切换时。

get_input_embeddings

( ) → nn.Module

返回

nn.Module

一个将词汇映射到隐藏状态的torch模块。

返回模型的输入嵌入。

get_memory_footprint

( return_buffers = True )

参数

return_buffers (bool, 可选, 默认为 True) — 是否在计算内存占用时返回缓冲区张量的大小。缓冲区是不需要梯度且未注册为参数的张量。例如，批归一化层中的均值和标准差。请参阅：https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2

获取模型的内存占用。这将返回当前模型的内存占用（以字节为单位）。对于基准测试当前模型的内存占用并设计一些测试非常有用。解决方案灵感来自 PyTorch 讨论：https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

get_output_embeddings

( ) → nn.Module

返回

nn.Module

一个将隐藏状态映射到词汇表的torch模块。

返回模型的输出嵌入。

gradient_checkpointing_disable

( )

为当前模型停用梯度检查点。

请注意，在其他框架中，此功能可能被称为“激活检查点”或“检查点激活”。

gradient_checkpointing_enable

( gradient_checkpointing_kwargs = 无 )

参数

gradient_checkpointing_kwargs (dict, optional) — 传递给 torch.utils.checkpoint.checkpoint 函数的额外关键字参数。

为当前模型激活梯度检查点。

请注意，在其他框架中，此功能可能被称为“激活检查点”或“检查点激活”。

我们传递模块的__call__方法而不是forward，因为__call__附加了模块的所有钩子。https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2

初始化权重

( )

如果需要，修剪并可能初始化权重。如果使用自定义的PreTrainedModel，你需要在_init_weights中实现任何初始化逻辑。

post_init

( )

在每个Transformer模型初始化结束时执行的方法，用于执行需要模型模块正确初始化的代码（例如权重初始化）。

prune_heads

( heads_to_prune: typing.Dict[int, typing.List[int]] )

参数

heads_to_prune (Dict[int, List[int]]) — 字典的键为选定的层索引（int），关联的值为在该层中要修剪的头列表（int的列表）。例如 {1: [0, 2], 2: [2, 3]} 将在第1层修剪头0和2，并在第2层修剪头2和3。

修剪基础模型的头部。

register_for_auto_class

( auto_class = 'AutoModel' )

参数

auto_class (str 或 type, 可选, 默认为 "AutoModel") — 用于注册此新模型的自动类。

将此类注册到给定的自动类。这仅应用于自定义模型，因为库中的模型已经映射到自动类。

此API是实验性的，在接下来的版本中可能会有一些轻微的破坏性更改。

resize_token_embeddings

( new_num_tokens: typing.Optional[int] = None pad_to_multiple_of: typing.Optional[int] = None mean_resizing: bool = True ) → torch.nn.Embedding

参数

new_num_tokens (int, 可选) — 嵌入矩阵中的新令牌数量。增加大小将在末尾添加新初始化的向量。减少大小将从末尾移除向量。如果未提供或为None，则仅返回指向模型输入令牌torch.nn.Embedding模块的指针，而不执行任何操作。
pad_to_multiple_of (int, optional) — If set will pad the embedding matrix to a multiple of the provided value.If new_num_tokens is set to None will just pad the embedding to a multiple of pad_to_multiple_of.
这对于在计算能力>= 7.5（Volta）的NVIDIA硬件上启用Tensor Cores特别有用，或者对于TPUs来说，序列长度为128的倍数是有益的。有关此内容的更多详细信息，或帮助选择调整大小的正确值，请参阅本指南： https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
mean_resizing (bool) — Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings’ mean and covariance or to initialize them with a normal distribution that has a mean of zero and std equals config.initializer_range.
将mean_resizing设置为True在增加因果语言模型的嵌入大小时非常有用，因为通过使用旧嵌入的平均值初始化新嵌入，可以减少添加新嵌入前后下一个标记概率之间的kl散度，从而不会影响生成标记的概率。更多信息请参考这篇文章：https://nlp.stanford.edu/~johnhew/vocab-expansion.html

返回

torch.nn.Embedding

指向模型输入标记嵌入模块的指针。

如果 new_num_tokens != config.vocab_size，则调整模型的输入标记嵌入矩阵的大小。

如果模型类具有tie_weights()方法，则负责之后绑定权重嵌入。

reverse_bettertransformer

( ) → PreTrainedModel

返回

PreTrainedModel

模型转换回原始建模。

撤销从to_bettertransformer()的转换，以便使用原始建模，例如为了保存模型。

save_pretrained

( save_directory: typing.Union[str, os.PathLike] is_main_process: bool = True state_dict: typing.Optional[dict] = None save_function: typing.Callable = push_to_hub: bool = False max_shard_size: typing.Union[int, str] = '5GB' safe_serialization: bool = True variant: typing.Optional[str] = None token: typing.Union[bool, str, NoneType] = None save_peft_format: bool = True **kwargs )

参数

save_directory (str or os.PathLike) — 要保存的目录。如果不存在，将会被创建。
is_main_process (bool, 可选, 默认为 True) — 调用此函数的进程是否为主进程。在分布式训练（如TPUs）中非常有用，需要在所有进程上调用此函数。在这种情况下，仅在主进程上设置 is_main_process=True 以避免竞争条件。
state_dict (torch.Tensor 的嵌套字典) — 要保存的模型的状态字典。默认情况下将使用 self.state_dict()，但可以用于仅保存模型的部分内容，或者在恢复模型的状态字典时需要采取特殊预防措施时（例如在使用模型并行时）。
save_function (Callable) — 用于保存状态字典的函数。在分布式训练（如TPUs）中非常有用，当需要替换torch.save为另一种方法时。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存后将模型推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中的 save_directory 名称）。
max_shard_size (int or str, optional, defaults to "5GB") — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB"). We default it to 5GB in order for models to be able to run easily on free-tier google colab instances without CPU OOM issues.

如果模型的单个权重大于max_shard_size，它将位于自己的检查点分片中，该分片将大于max_shard_size。
safe_serialization (bool, 可选, 默认为 True) — 是否使用 safetensors 或传统的 PyTorch 方式（使用 pickle）保存模型。
variant (str, 可选) — 如果指定，权重将以 pytorch_model..bin 的格式保存。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为True，或未指定，将使用运行huggingface-cli login时生成的令牌（存储在~/.huggingface中）。
save_peft_format (bool, 可选, 默认为 True) — 为了与PEFT库向后兼容，如果适配器权重附加到模型上，适配器状态字典的所有键需要以base_model.model为前缀。高级用户可以通过将save_peft_format设置为False来禁用此行为。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的额外关键字参数。

将模型及其配置文件保存到一个目录中，以便可以使用 from_pretrained() 类方法重新加载。

set_input_embeddings

( value: 模块 )

参数

value (nn.Module) — 一个将词汇表映射到隐藏状态的模块。

设置模型的输入嵌入。

tensor_parallel

( device_mesh )

参数

device_mesh (torch.distributed.DeviceMesh) — 用于张量并行的设备网格。

在给定的设备网格上对模型进行张量并行化。

tie_weights

( )

将输入嵌入和输出嵌入之间的权重绑定。

如果在配置中设置了torchscript标志，无法处理参数共享，因此我们改为克隆权重。

to_bettertransformer

( ) → PreTrainedModel

返回

PreTrainedModel

模型已转换为BetterTransformer。

将模型转换为使用PyTorch的原生注意力实现，通过Optimum库集成到Transformers中。仅支持所有Transformers模型的一个子集。

PyTorch的注意力快速路径允许通过内核融合和使用嵌套张量来加速推理。详细的基准测试可以在这篇博客文章中找到。

warn_if_padding_and_no_attention_mask

( input_ids attention_mask )

如果input_ids似乎包含填充且未提供注意力掩码，则显示一次性警告。

自定义模型还应包括一个_supports_assign_param_buffer，它决定了超级快速初始化是否可以应用于特定模型。如果test_save_and_load_from_pretrained失败，则表明您的模型需要此功能。如果是这样，请将其设置为False。

ModuleUtilsMixin

类 transformers.modeling_utils.ModuleUtilsMixin

( )

一些用于torch.nn.Modules的工具，可以作为混入使用。

add_memory_hooks

( )

在每个子模块前向传递前后添加内存钩子，以记录内存消耗的增加。

内存消耗的增加存储在mem_rss_diff属性中，每个模块都可以通过model.reset_memory_hooks_state()将其重置为零。

estimate_tokens

( input_dict: typing.Dict[str, typing.Union[torch.Tensor, typing.Any]] ) → int

参数

inputs (dict) — 模型输入。

返回

int

令牌的总数。

辅助函数，用于从模型输入中估计总令牌数。

floating_point_ops

( input_dict: typing.Dict[str, typing.Union[torch.Tensor, typing.Any]] exclude_embeddings: bool = True ) → int

参数

batch_size (int) — 前向传递的批量大小。
sequence_length (int) — 批次中每行的令牌数量。
exclude_embeddings (bool, 可选, 默认为 True) — 是否计算嵌入和softmax操作。

返回

int

浮点运算的数量。

获取使用此变压器模型进行前向和后向传递的批次（可选地，非嵌入）浮点运算次数。默认近似忽略了与令牌数量的二次依赖关系（如果12 * d_model << sequence_length则有效），如这篇论文第2.1节所述。对于具有参数重用的变压器（例如Albert或通用变压器），或者在进行具有非常高序列长度的长程建模时，应重写此方法。

get_extended_attention_mask

( attention_mask: 张量 input_shape: 类型.元组[int] device: 设备 = 无 dtype: torch.float32 = 无 )

参数

attention_mask (torch.Tensor) — 用1表示需要关注的标记，用0表示需要忽略的标记。
input_shape (Tuple[int]) — 模型的输入形状。

生成可广播的注意力掩码和因果掩码，以便忽略未来和被掩码的标记。

get_head_mask

( head_mask: typing.Optional[torch.Tensor] num_hidden_layers: int is_attention_chunked: bool = False )

参数

head_mask (torch.Tensor 形状为 [num_heads] 或 [num_hidden_layers x num_heads], 可选) — 指示我们是否应该保留头部的掩码（1.0 表示保留，0.0 表示丢弃）。
num_hidden_layers (int) — 模型中的隐藏层数量。
is_attention_chunked (bool, optional, defaults to False) — 注意力分数是否按块计算。

如果需要，准备头部遮罩。

invert_attention_mask

( encoder_attention_mask: Tensor ) → torch.Tensor

参数

encoder_attention_mask (torch.Tensor) — 一个注意力掩码。

返回

torch.Tensor

反转的注意力掩码。

反转注意力掩码（例如，将0.和1.进行切换）。

num_parameters

( only_trainable: bool = False exclude_embeddings: bool = False ) → int

参数

only_trainable (bool, optional, 默认为 False) — 是否仅返回可训练参数的数量
exclude_embeddings (bool, optional, defaults to False) — 是否仅返回非嵌入参数的数量

返回

int

参数的数量。

获取模块中（可选地，可训练或非嵌入）参数的数量。

reset_memory_hooks_state

( )

重置每个模块的mem_rss_diff属性（参见add_memory_hooks()）。

TFPreTrainedModel

类 transformers.TFPreTrainedModel

( config *inputs **kwargs )

所有TF模型的基类。

TFPreTrainedModel 负责存储模型的配置，并处理加载、下载和保存模型的方法，以及一些所有模型通用的方法：

调整输入嵌入的大小，
在自注意力头中修剪头部。

类属性（由派生类覆盖）：

config_class (PretrainedConfig) — PretrainedConfig 的一个子类，用作此模型架构的配置类。
base_model_prefix (str) — 一个字符串，指示在相同架构的派生类中与基础模型关联的属性，这些派生类在基础模型之上添加了模块。
main_input_name (str) — 模型的主要输入名称（通常对于NLP模型是input_ids，对于视觉模型是pixel_values，对于语音模型是input_values）。

push_to_hub

( repo_id: str use_temp_dir: Optional[bool] = None commit_message: Optional[str] = None private: Optional[bool] = None max_shard_size: Optional[Union[int, str]] = '10GB' token: Optional[Union[bool, str]] = None use_auth_token: Optional[Union[bool, str]] = None create_pr: bool = False **base_model_card_args )

参数

repo_id (str) — 您想要推送模型到的仓库名称。当推送到特定组织时，它应包含您的组织名称。
use_temp_dir (bool, optional) — 是否使用临时目录来存储推送到 Hub 之前保存的文件。如果没有名为 repo_id 的目录，则默认为 True，否则为 False。
commit_message (str, optional) — 推送时提交的消息。默认为 "Upload model".
private (bool, 可选) — 是否将仓库设为私有。如果为 None（默认值），仓库将为公开，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "10GB") — 仅适用于模型。分片前检查点的最大大小。分片后的检查点大小将小于此大小。如果以字符串形式表示，需要是数字后跟单位（如 "5MB"）。
create_pr (bool, 可选, 默认为 False) — 是否创建一个带有上传文件的PR或直接提交。

将模型文件上传到🤗 Model Hub，同时同步repo_path_or_name中的本地仓库克隆。

示例：

from transformers import TFAutoModel

model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")

# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")

# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")

can_generate

( ) → bool

返回

bool

这个模型是否可以使用.generate()生成序列。

返回此模型是否可以使用.generate()生成序列。

编译

( optimizer = 'rmsprop' loss = 'auto_with_warning' metrics = None loss_weights = None weighted_metrics = None run_eagerly = None steps_per_execution = None **kwargs )

这是一个轻量级的包装器，如果用户没有自己指定损失函数，则将模型的损失输出头设置为损失。

create_model_card

( output_dir model_name: str language: 可选[str] = None license: 可选[str] = None tags: 可选[str] = None finetuned_from: 可选[str] = None tasks: 可选[str] = None dataset_tags: 可选[Union[str, List[str]]] = None dataset: 可选[Union[str, List[str]]] = None dataset_args: 可选[Union[str, List[str]]] = None )

参数

output_dir (str or os.PathLike) — 用于创建模型卡的文件夹。
model_name (str, optional) — 模型的名称。
语言 (str, 可选) — 模型的适用语言（如果适用）
license (str, optional) — 模型的许可证。如果提供给Trainer的原始模型来自Hub上的仓库，则默认使用预训练模型的许可证。
标签 (str 或 List[str], 可选) — 一些要包含在模型卡元数据中的标签。
finetuned_from (str, optional) — 用于微调此模型的模型名称（如果适用）。默认情况下，将使用提供给Trainer的原始模型的仓库名称（如果它来自Hub）。
tasks (str 或 List[str], 可选) — 一个或多个任务标识符，将包含在模型卡的元数据中。
dataset_tags (str 或 List[str], 可选) — 一个或多个数据集标签，将包含在模型卡的元数据中。
数据集 (str 或 List[str], 可选) — 一个或多个数据集标识符，将包含在模型卡的元数据中。
dataset_args (str 或 List[str], 可选) — 一个或多个数据集参数，将被包含在模型卡的元数据中。

使用Trainer可用的信息创建模型卡的草稿。

from_pretrained

( pretrained_model_name_or_path: 可选[联合[str, os.PathLike]] *model_args config: 可选[联合[PretrainedConfig, str, os.PathLike]] = 无 cache_dir: 可选[联合[str, os.PathLike]] = 无 ignore_mismatched_sizes: 布尔值 = 假 force_download: 布尔值 = 假 local_files_only: 布尔值 = 假 token: 可选[联合[str, 布尔值]] = 无 revision: 字符串 = 'main' use_safetensors: 布尔值 = 无 **kwargs )

参数

pretrained_model_name_or_path (str, optional) — Can be either:
- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using save_pretrained(), e.g., ./my_model_directory/.
- A path or url to a PyTorch state_dict save file (e.g, ./pt_model/pytorch_model.bin). In this case, from_pt should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- None if you are both providing the configuration and state dictionary (resp. with keyword arguments config and state_dict).
model_args (位置参数的序列, 可选) — 所有剩余的位置参数将传递给底层模型的 __init__ 方法.
config (Union[PretrainedConfig, str], optional) — Can be either:
- an instance of a class derived from PretrainedConfig,
- a string valid as input to from_pretrained().
用于模型的配置，而不是自动加载的配置。配置可以在以下情况下自动加载：
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
from_pt (bool, 可选, 默认为 False) — 从 PyTorch 的 state_dict 保存文件中加载模型权重（参见 pretrained_model_name_or_path 参数的文档字符串）。
ignore_mismatched_sizes (bool, 可选, 默认为 False) — 是否在检查点中的某些权重与模型权重大小不匹配时引发错误（例如，如果您正在从一个具有3个标签的检查点实例化一个具有10个标签的模型）。
cache_dir (str, optional) — 如果不应使用标准缓存，则应缓存下载的预训练模型配置的目录路径。
force_download (bool, optional, defaults to False) — 是否强制（重新）下载模型权重和配置文件，覆盖已存在的缓存版本。
resume_download — 已弃用并被忽略。现在默认情况下，所有下载在可能的情况下都会自动恢复。将在Transformers的v5版本中移除。
proxies — (Dict[str, str], 可选): 一个按协议或端点使用的代理服务器字典，例如 {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. 代理在每个请求中使用。 output_loading_info(bool, *可选*, 默认为 False`): 是否还返回一个包含缺失键、意外键和错误消息的字典。
local_files_only(bool, 可选, 默认为 False) — 是否仅查看本地文件（例如，不尝试下载模型）。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True 或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。
revision (str, optional, defaults to "main") — 使用的特定模型版本。它可以是分支名称、标签名称或提交ID，因为我们使用基于git的系统在huggingface.co上存储模型和其他工件，所以revision可以是git允许的任何标识符。

从预训练模型配置中实例化一个预训练的TF 2.0模型。

警告 Weights from XXX not initialized from pretrained model 意味着 XXX 的权重并未与模型的其余部分一起进行预训练。您需要通过下游的微调任务来训练这些权重。

警告 Weights from XXX not used in YYY 意味着层 XXX 未被 YYY 使用，因此这些权重被丢弃。

示例：

>>> from transformers import BertConfig, TFBertModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = TFBertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a Pytorch model file instead of a TensorFlow checkpoint (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./pt_model/my_pt_model_config.json")
>>> model = TFBertModel.from_pretrained("./pt_model/my_pytorch_model.bin", from_pt=True, config=config)

get_bias

( ) → tf.Variable

返回

tf.Variable

表示偏置的权重，如果不是LM模型则为None。

附加到LM头的偏置字典。键表示偏置属性的名称。

get_head_mask

( head_mask: tf.Tensor | None num_hidden_layers: int )

参数

head_mask (tf.Tensor 形状为 [num_heads] 或 [num_hidden_layers x num_heads], 可选) — 指示我们是否应该保留头部的掩码（1.0 表示保留，0.0 表示丢弃）。
num_hidden_layers (int) — 模型中的隐藏层数量。

如果需要，准备头部遮罩。

get_input_embeddings

( ) → tf.Variable

返回

tf.Variable

嵌入层将词汇映射到隐藏状态。

返回模型的输入嵌入层。

get_lm_head

( ) → keras.layers.Layer

返回

keras.layers.Layer

如果模型有LM头层，则为该层；如果没有，则为None。

LM Head 层。所有具有 lm head 的模型都必须重写此方法。

get_output_embeddings

( ) → tf.Variable

返回

tf.Variable

新的权重映射词汇到隐藏状态。

返回模型的输出嵌入

get_output_layer_with_bias

( ) → keras.layers.Layer

返回

keras.layers.Layer

处理偏差的层，如果不是LM模型则为None。

获取处理偏差属性的层，以防模型具有与嵌入权重绑定的LM头

get_prefix_bias_name

( ) → str

返回

str

偏置的前缀名称。

从模型名称到父层获取偏置的拼接前缀名称

prepare_tf_dataset

( dataset: 'datasets.Dataset' batch_size: int = 8 shuffle: bool = True tokenizer: Optional['PreTrainedTokenizerBase'] = None collate_fn: Optional[Callable] = None collate_fn_args: Optional[Dict[str, Any]] = None drop_remainder: Optional[bool] = None prefetch: bool = True ) → Dataset

参数

dataset (Any) — 一个 [~datasets.Dataset] 将被包装为 tf.data.Dataset.
batch_size (int, optional, 默认为 8) — 返回的批次大小。
shuffle (bool, 默认为 True) — 是否以随机顺序返回数据集中的样本。通常训练数据集为 True，验证/测试数据集为 False.
tokenizer (PreTrainedTokenizerBase, optional) — 一个 PreTrainedTokenizer，将用于填充样本以创建批次。如果传递了特定的 collate_fn，则无效。
collate_fn (Callable, 可选) — 一个将数据集中的样本整理成单个批次的函数。如果没有提供tokenizer，则默认为 DefaultDataCollator，如果传递了tokenizer，则为DataCollatorWithPadding.
collate_fn_args (Dict[str, Any], 可选) — 一个字典参数，用于在样本列表旁边传递给 collate_fn 函数。
drop_remainder (bool, 可选) — 是否丢弃最后一批数据，如果 batch_size 不能整除数据集长度。默认值与 shuffle 的设置相同。
prefetch (bool, 默认为 True) — 是否在 tf.data 管道的末尾添加预取。这几乎总是对性能有益，但在边缘情况下可以禁用。

返回

Dataset

一个tf.data.Dataset，它已经准备好传递给Keras API。

将HuggingFace的Dataset包装为带有整理和批处理的tf.data.Dataset。此方法旨在创建一个“即用型”数据集，可以直接传递给Keras方法如fit()而无需进一步修改。如果数据集中的列与模型的输入名称不匹配，该方法将删除这些列。如果您想指定要返回的列名而不是使用与此模型匹配的名称，我们建议使用Dataset.to_tf_dataset()。

prune_heads

( heads_to_prune )

参数

heads_to_prune (Dict[int, List[int]]) — 字典，键为选定的层索引（int），关联值为在该层中要修剪的头的列表（int的列表）。例如 {1: [0, 2], 2: [2, 3]} 将在第1层修剪头0和2，并在第2层修剪头2和3。

修剪基础模型的头部。

register_for_auto_class

( auto_class = 'TFAutoModel' )

参数

auto_class (str 或 type, 可选, 默认为 "TFAutoModel") — 用于注册此新模型的自动类。

将此类注册到给定的自动类。这仅应用于自定义模型，因为库中的模型已经映射到自动类。

此API是实验性的，在接下来的版本中可能会有一些轻微的破坏性更改。

resize_token_embeddings

( new_num_tokens: Optional[int] = None ) → tf.Variable 或 keras.layers.Embedding

参数

new_num_tokens (int, 可选) — 嵌入矩阵中新标记的数量。增加大小将在末尾添加新初始化的向量。减少大小将从末尾移除向量。如果未提供或为None，则仅返回输入标记的指针而不执行任何操作。

返回

tf.Variable 或 keras.layers.Embedding

指向模型输入标记的指针。

如果 new_num_tokens != config.vocab_size，则调整模型的输入标记嵌入矩阵的大小。

如果模型类具有tie_weights()方法，则负责之后绑定权重嵌入。

save_pretrained

( save_directory saved_model = False version = 1 push_to_hub = False signatures = None max_shard_size: Union[int, str] = '5GB' create_pr: bool = False safe_serialization: bool = False token: Optional[Union[str, bool]] = None **kwargs )

参数

save_directory (str) — 要保存的目录。如果不存在，将会被创建。
saved_model (bool, 可选, 默认为 False) — 模型是否也需要以saved model格式保存。
版本 (int, 可选, 默认为 1) — 保存模型的版本。保存的模型需要版本化，以便能够被 TensorFlow Serving 正确加载，详情请参阅官方文档 https://www.tensorflow.org/tfx/serving/serving_basic
push_to_hub (bool, optional, defaults to False) — 是否在保存后将模型推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中的 save_directory 名称）。
signatures (dict 或 tf.function, 可选) — 用于服务的模型签名。这将传递给 signatures 参数的 model.save() 方法。
max_shard_size (int or str, optional, defaults to "10GB") — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").

如果模型的单个权重大于max_shard_size，它将位于自己的检查点分片中，该分片将大于max_shard_size。
create_pr (bool, 可选, 默认为 False) — 是否创建一个带有上传文件的PR或直接提交。
safe_serialization (bool, 可选, 默认为 False) — 是否使用 safetensors 或传统的 TensorFlow 方式（使用 h5）保存模型。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True 或未指定，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的额外关键字参数。

将模型及其配置文件保存到一个目录中，以便可以使用 from_pretrained() 类方法重新加载。

服务

( 输入 )

参数

方法用于服务模型。没有特定的签名，但将具体化为具体的 —
函数在使用 save_pretrained 保存时。 — 输入 (Dict[str, tf.Tensor]): 保存模型的输入作为张量的字典。

serving_output

( 输出 )

准备保存模型的输出。如果需要特定的服务修改，可以覆盖此方法。

set_bias

( 值 )

参数

value (Dict[tf.Variable]) — 所有附加到LM头的新偏置。

设置LM头中的所有偏差。

set_input_embeddings

( 值 )

参数

value (tf.Variable) — 新的权重映射隐藏状态到词汇表。

设置模型的输入嵌入

set_output_embeddings

( 值 )

参数

value (tf.Variable) — 新的权重映射隐藏状态到词汇表。

设置模型的输出嵌入

test_step

( data )

对Keras默认的train_step的修改，正确地将输出与我们的模型的标签匹配，并支持直接在损失输出头上进行训练。此外，它确保在适当的情况下将输入键复制到标签中。在使用虚拟损失时，它还会将标签键复制到输入字典中，以确保在前向传递过程中模型可以使用它们。

train_step

( 数据 )

对Keras默认的train_step的修改，正确地将输出与我们的模型的标签匹配，并支持直接在损失输出头上进行训练。此外，它确保在适当的情况下将输入键复制到标签中。在使用虚拟损失时，它还会将标签键复制到输入字典中，以确保在前向传递过程中模型可以使用它们。

TFModelUtilsMixin

类 transformers.modeling_tf_utils.TFModelUtilsMixin

( )

一些用于 keras.Model 的工具，可以作为混入使用。

num_parameters

( only_trainable: bool = False ) → int

参数

only_trainable (bool, optional, defaults to False) — 是否仅返回可训练参数的数量

返回

int

参数的数量。

获取模型中（可选的，可训练的）参数的数量。

FlaxPreTrainedModel

类 transformers.FlaxPreTrainedModel

( config: PretrainedConfig module: Module input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = _do_init: bool = True )

所有模型的基类。

FlaxPreTrainedModel 负责存储模型的配置，并处理加载、下载和保存模型的方法。

类属性（由派生类覆盖）：

config_class (PretrainedConfig) — PretrainedConfig 的一个子类，用作此模型架构的配置类。
base_model_prefix (str) — 一个字符串，指示在相同架构的派生类中与基础模型关联的属性，这些派生类在基础模型之上添加了模块。
main_input_name (str) — 模型的主要输入名称（通常对于NLP模型是input_ids，对于视觉模型是pixel_values，对于语音模型是input_values）。

push_to_hub

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — 您想要推送模型到的仓库名称。当推送到特定组织时，它应包含您的组织名称。
use_temp_dir (bool, optional) — 是否使用临时目录来存储推送到 Hub 之前保存的文件。如果没有名为 repo_id 的目录，则默认为 True，否则为 False。
commit_message (str, optional) — 推送时提交的消息。默认为 "Upload model".
private (bool, 可选) — 是否将仓库设为私有。如果为 None（默认值），仓库将为公开，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "5GB") — 仅适用于模型。分片前检查点的最大大小。分片后的检查点大小将小于此大小。如果以字符串形式表示，需要是数字后跟单位（如 "5MB"）。我们默认将其设置为 "5GB"，以便用户可以在免费层级的Google Colab实例上轻松加载模型，而不会出现CPU内存不足的问题。
create_pr (bool, optional, defaults to False) — 是否创建一个带有上传文件的PR或直接提交。
safe_serialization (bool, 可选, 默认为 True) — 是否将模型权重转换为safetensors格式以实现更安全的序列化。
revision (str, optional) — 将上传的文件推送到的分支.
commit_description (str, optional) — 将要创建的提交的描述
标签 (List[str], 可选) — 推送到Hub的标签列表。

将模型检查点上传到🤗模型中心。

示例：

from transformers import FlaxAutoModel

model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased")

# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")

# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")

can_generate

( )

返回此模型是否可以使用.generate()生成序列。返回： bool: 此模型是否可以使用.generate()生成序列。

from_pretrained

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] dtype: dtype = *model_args config: typing.Union[transformers.configuration_utils.PretrainedConfig, str, os.PathLike, NoneType] = None cache_dir: typing.Union[str, os.PathLike, NoneType] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: typing.Union[bool, str, NoneType] = None revision: str = 'main' **kwargs )

参数

pretrained_model_name_or_path (str 或 os.PathLike) — 可以是以下之一：
- 一个字符串，表示托管在 huggingface.co 上的模型仓库中的预训练模型的 模型 id。
- 一个路径，指向使用 save_pretrained() 保存的模型权重的目录，例如 ./my_model_directory/。
- 一个路径或 URL，指向 pt 索引检查点文件（例如，./tf_model/model.ckpt.index）。在这种情况下，应将 from_pt 设置为 True。
dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).
这可以用于在GPU或TPU上启用混合精度训练或半精度推理。如果指定，所有计算将使用给定的dtype执行。

请注意，这仅指定了计算的数据类型，并不影响模型参数的数据类型。

如果您希望更改模型参数的dtype，请参阅to_fp16()和 to_bf16().
model_args (位置参数的序列, 可选) — 所有剩余的位置参数将被传递给底层模型的 __init__ 方法.
config (Union[PretrainedConfig, str, os.PathLike], optional) — Can be either:
- an instance of a class derived from PretrainedConfig,
- a string or path valid as input to from_pretrained().
用于模型的配置，而不是自动加载的配置。配置可以在以下情况下自动加载：
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.
cache_dir (Union[str, os.PathLike], 可选) — 如果不应使用标准缓存，则应缓存下载的预训练模型配置的目录路径。
from_pt (bool, 可选, 默认为 False) — 从 PyTorch 检查点保存文件加载模型权重（参见 pretrained_model_name_or_path 参数的文档字符串）。
ignore_mismatched_sizes (bool, 可选, 默认为 False) — 是否在检查点中的某些权重与模型权重大小不匹配时引发错误（例如，如果您正在从一个具有3个标签的检查点实例化一个具有10个标签的模型）。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）下载模型权重和配置文件，覆盖已存在的缓存版本。
resume_download — 已弃用并被忽略。现在默认情况下，所有下载在可能的情况下都会自动恢复。将在Transformers的v5版本中移除。
proxies (Dict[str, str], 可选) — 一个按协议或端点使用的代理服务器字典，例如 {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}。这些代理在每个请求中使用。
local_files_only(bool, 可选, 默认为 False) — 是否仅查看本地文件（即不尝试下载模型）。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为True，或未指定，将使用运行huggingface-cli login时生成的令牌（存储在~/.huggingface中）。
revision (str, optional, defaults to "main") — 使用的特定模型版本。它可以是分支名称、标签名称或提交ID，因为我们在huggingface.co上使用基于git的系统来存储模型和其他工件，所以revision可以是git允许的任何标识符。

从预训练模型配置中实例化一个预训练的flax模型。

警告 Weights from XXX not initialized from pretrained model 意味着 XXX 的权重并未与模型的其余部分一起进行预训练。您需要通过下游的微调任务来训练这些权重。

警告 Weights from XXX not used in YYY 意味着层 XXX 未被 YYY 使用，因此这些权重被丢弃。

示例：

>>> from transformers import BertConfig, FlaxBertModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = FlaxBertModel.from_pretrained("./test/saved_model/")
>>> # Loading from a PyTorch checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./pt_model/config.json")
>>> model = FlaxBertModel.from_pretrained("./pt_model/pytorch_model.bin", from_pt=True, config=config)

load_flax_sharded_weights

( shard_files ) → Dict

参数

shard_files (List[str] — 要加载的分片文件列表。

返回

Dict

模型参数的嵌套字典，以flax模型预期的格式：{'model': {'params': {'...'}}}。

这与 flax.serialization.from_bytes (https:lax.readthedocs.io/en/latest/_modules/flax/serialization.html#from_bytes) 相同，但适用于分片检查点。

此加载操作高效执行：每个检查点分片依次加载到RAM中，并在加载到模型后删除。

register_for_auto_class

( auto_class = 'FlaxAutoModel' )

参数

auto_class (str 或 type, 可选, 默认为 "FlaxAutoModel") — 用于注册此新模型的自动类。

将此类注册到给定的自动类。这仅应用于自定义模型，因为库中的模型已经映射到自动类。

此API是实验性的，在接下来的版本中可能会有一些轻微的破坏性更改。

save_pretrained

( 保存目录: typing.Union[str, os.PathLike] 参数 = 无推送到中心 = 假最大分片大小 = '10GB' 令牌: typing.Union[bool, str, NoneType] = 无安全序列化: bool = 假 **kwargs )

参数

save_directory (str or os.PathLike) — 要保存的目录。如果不存在，将会被创建。
push_to_hub (bool, 可选, 默认为 False) — 是否在保存后将模型推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的仓库（默认为您命名空间中的 save_directory 名称）。
max_shard_size (int or str, optional, defaults to "10GB") — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like "5MB").

如果模型的单个权重大于max_shard_size，它将位于自己的检查点分片中，该分片将大于max_shard_size。
token (str 或 bool, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为True，或未指定，将使用运行huggingface-cli login时生成的令牌（存储在~/.huggingface中）。
kwargs (Dict[str, Any], 可选) — 传递给 push_to_hub() 方法的额外关键字参数。
safe_serialization (bool, 可选, 默认为 False) — 是否使用 safetensors 或通过 msgpack 保存模型。

将模型及其配置文件保存到一个目录中，以便可以使用 [from_pretrained()](/docs/transformers/v4.47.1/en/main_classes/model#transformers.FlaxPreTrainedModel.from_pretrained) 类方法重新加载

to_bf16

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

参数

params (Union[Dict, FrozenDict]) — 模型的 PyTree 参数。
mask (Union[Dict, FrozenDict]) — 一个与params树结构相同的PyTree。叶子节点应为布尔值，True表示你想要转换的参数，False表示你想要跳过的参数。

将浮点数 params 转换为 jax.numpy.bfloat16。这将返回一个新的 params 树，并且不会在原地转换 params。

此方法可用于TPU上，以显式将模型参数转换为bfloat16精度，以进行全半精度训练，或为了节省内存和提高速度，在推理时以bfloat16保存权重。

示例：

>>> from transformers import FlaxBertModel

>>> # load model
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model parameters will be in fp32 precision, to cast these to bfloat16 precision
>>> model.params = model.to_bf16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util

>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
...     path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
...     for path in flat_params
... }
>>> mask = traverse_util.unflatten_dict(mask)
>>> model.params = model.to_bf16(model.params, mask)

to_fp16

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

参数

params (Union[Dict, FrozenDict]) — 模型的 PyTree 参数。
mask (Union[Dict, FrozenDict]) — 一个与params树结构相同的PyTree。叶子节点应为布尔值，True表示你想要转换的参数， False表示你想要跳过的参数

将浮点数 parmas 转换为 jax.numpy.float16。这将返回一个新的 params 树，并且不会在原地转换 params。

此方法可以在GPU上使用，以显式将模型参数转换为float16精度，以进行完整的半精度训练，或者为了节省内存和提高速度，在推理时以float16保存权重。

示例：

>>> from transformers import FlaxBertModel

>>> # load model
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to cast these to float16
>>> model.params = model.to_fp16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util

>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
...     path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
...     for path in flat_params
... }
>>> mask = traverse_util.unflatten_dict(mask)
>>> model.params = model.to_fp16(model.params, mask)

to_fp32

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

参数

params (Union[Dict, FrozenDict]) — 模型的 PyTree 参数。
mask (Union[Dict, FrozenDict]) — 一个与params树结构相同的PyTree。叶子节点应为布尔值，True表示你想要转换的参数，False表示你想要跳过的参数

将浮点数 parmas 转换为 jax.numpy.float32。此方法可用于显式地将模型参数转换为 fp32 精度。这将返回一个新的 params 树，并且不会在原地转换 params。

示例：

>>> from transformers import FlaxBertModel

>>> # Download model and configuration from huggingface.co
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to illustrate the use of this method,
>>> # we'll first cast to fp16 and back to fp32
>>> model.params = model.to_f16(model.params)
>>> # now cast back to fp32
>>> model.params = model.to_fp32(model.params)

推送到Hub

类 transformers.utils.PushToHubMixin

( )

一个包含将模型或分词器推送到集线器功能的Mixin。

push_to_hub

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )

参数

repo_id (str) — 您想要推送{object}到的仓库名称。当推送到特定组织时，它应包含您的组织名称。
use_temp_dir (bool, 可选) — 是否使用临时目录来存储推送到 Hub 之前保存的文件。如果没有名为 repo_id 的目录，则默认为 True，否则为 False。
commit_message (str, optional) — 推送时提交的消息。默认为 "Upload {object}".
private (bool, 可选) — 是否将仓库设为私有。如果为None（默认值），仓库将为公开，除非组织的默认设置为私有。如果仓库已存在，则忽略此值。
token (bool 或 str, 可选) — 用于远程文件的HTTP承载授权的令牌。如果为 True，将使用运行 huggingface-cli login 时生成的令牌（存储在 ~/.huggingface 中）。如果未指定 repo_url，则默认为 True。
max_shard_size (int 或 str, 可选, 默认为 "5GB") — 仅适用于模型。分片前检查点的最大大小。分片后的检查点大小将小于此大小。如果以字符串形式表示，需要是数字后跟单位（如 "5MB"）。我们默认将其设置为 "5GB"，以便用户可以在免费层级的 Google Colab 实例上轻松加载模型，而不会出现 CPU 内存不足的问题。
create_pr (bool, 可选, 默认为 False) — 是否创建一个带有上传文件的PR或直接提交。
safe_serialization (bool, optional, defaults to True) — 是否将模型权重转换为safetensors格式以实现更安全的序列化。
revision (str, optional) — 将上传的文件推送到的分支.
commit_description (str, optional) — 将要创建的提交的描述
标签 (List[str], 可选) — 推送到 Hub 的标签列表。

将 {object_files} 上传到 🤗 模型中心。

示例：

from transformers import {object_class}

{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")

# Push the {object} to your namespace with the name "my-finetuned-bert".
{object}.push_to_hub("my-finetuned-bert")

# Push the {object} to an organization with the name "my-finetuned-bert".
{object}.push_to_hub("huggingface/my-finetuned-bert")

分片检查点

transformers.modeling_utils.load_sharded_checkpoint

( model folder strict = True prefer_safe = True ) → NamedTuple

参数

model (torch.nn.Module) — 加载检查点的模型。
文件夹 (str 或 os.PathLike) — 包含分片检查点的文件夹路径。
strict (bool, *optional, defaults to True`) — 是否严格强制模型状态字典中的键与分片检查点中的键匹配。
prefer_safe (bool, 可选, 默认为 False) — 如果检查点中同时存在 safetensors 和 PyTorch 保存文件，并且 prefer_safe 为 True，则将加载 safetensors 文件。否则，将尽可能加载 PyTorch 文件。

返回

NamedTuple

一个包含missing_keys和unexpected_keys字段的命名元组

missing_keys是一个包含缺失键的字符串列表
unexpected_keys是一个包含意外键的字符串列表

这与 torch.nn.Module.load_state_dict 相同，但适用于分片检查点。

此加载操作高效执行：每个检查点分片依次加载到RAM中，并在加载到模型后删除。

< > Update on GitHub

←Logging Text Generation→