Transformers

故障排除

有时会发生错误，但我们在这里提供帮助！本指南涵盖了我们见过的一些最常见问题以及如何解决它们。然而，本指南并不旨在成为每个🤗 Transformers问题的全面集合。如需更多关于故障排除的帮助，请尝试：

在论坛上寻求帮助。有特定的类别可以发布你的问题，比如初学者或🤗 Transformers。确保你写了一个描述性强的论坛帖子，并附上一些可复现的代码，以最大限度地提高问题解决的可能性！

如果是一个与库相关的错误，请在🤗 Transformers仓库上创建一个Issue。尽量包含尽可能多的信息来描述错误，以帮助我们更好地找出问题所在以及如何修复它。
如果您使用的是旧版本的🤗 Transformers，请查看迁移指南，因为版本之间引入了一些重要更改。

有关故障排除和获取帮助的更多详细信息，请查看Hugging Face课程的第8章。

防火墙环境

云和内网设置中的一些GPU实例被防火墙阻止了外部连接，导致连接错误。当您的脚本尝试下载模型权重或数据集时，下载将挂起，然后超时并显示以下消息：

ValueError: Connection error, and we cannot find the requested files in the cached path.
Please try again or make sure your Internet connection is on.

在这种情况下，您应该尝试在离线模式下运行🤗 Transformers，以避免连接错误。

CUDA 内存不足

在没有适当硬件的情况下，训练具有数百万参数的大型模型可能会很具有挑战性。当GPU内存不足时，您可能会遇到的一个常见错误是：

CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacity; 9.70 GiB already allocated; 179.81 MiB free; 9.85 GiB reserved in total by PyTorch)

以下是一些您可以尝试的减少内存使用的潜在解决方案：

减少per_device_train_batch_size在TrainingArguments中的值。
尝试在TrainingArguments中使用gradient_accumulation_steps来有效增加整体批量大小。

有关内存节省技术的更多详细信息，请参阅性能指南。

无法加载保存的TensorFlow模型

TensorFlow的model.save方法会将整个模型——架构、权重、训练配置——保存到一个文件中。然而，当你再次加载模型文件时，可能会遇到错误，因为🤗 Transformers可能无法加载模型文件中的所有与TensorFlow相关的对象。为了避免保存和加载TensorFlow模型时出现问题，我们建议你：

将模型权重保存为h5文件扩展名，使用model.save_weights，然后使用from_pretrained()重新加载模型：

>>> from transformers import TFPreTrainedModel
>>> from tensorflow import keras

>>> model.save_weights("some_folder/tf_model.h5")
>>> model = TFPreTrainedModel.from_pretrained("some_folder")

使用~TFPretrainedModel.save_pretrained保存模型，并使用from_pretrained()再次加载它：

>>> from transformers import TFPreTrainedModel

>>> model.save_pretrained("path_to/model")
>>> model = TFPreTrainedModel.from_pretrained("path_to/model")

导入错误

您可能会遇到的另一个常见错误，特别是如果它是一个新发布的模型，是ImportError：

ImportError: cannot import name 'ImageGPTImageProcessor' from 'transformers' (unknown location)

对于这些错误类型，请检查确保您已安装最新版本的🤗 Transformers，以访问最新的模型：

pip install transformers --upgrade

CUDA 错误：设备端断言触发

有时你可能会遇到一个关于设备代码错误的通用CUDA错误。

RuntimeError: CUDA error: device-side assert triggered

你应该首先尝试在CPU上运行代码，以获得更详细的错误信息。在你的代码开头添加以下环境变量以切换到CPU：

>>> import os

>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""

另一个选项是从GPU获取更好的回溯信息。在代码的开头添加以下环境变量，以使回溯指向错误的来源：

>>> import os

>>> os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

当填充标记未被屏蔽时的错误输出

在某些情况下，如果input_ids包含填充标记，输出hidden_state可能会不正确。为了演示，加载一个模型和分词器。你可以访问模型的pad_token_id来查看其值。对于某些模型，pad_token_id可能是None，但你始终可以手动设置它。

>>> from transformers import AutoModelForSequenceClassification
>>> import torch

>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
>>> model.config.pad_token_id
0

以下示例显示了未屏蔽填充标记的输出：

>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]])
>>> output = model(input_ids)
>>> print(output.logits)
tensor([[ 0.0082, -0.2307],
        [ 0.1317, -0.1683]], grad_fn=<AddmmBackward0>)

以下是第二个序列的实际输出：

>>> input_ids = torch.tensor([[7592]])
>>> output = model(input_ids)
>>> print(output.logits)
tensor([[-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)

大多数情况下，你应该为你的模型提供一个attention_mask，以忽略填充标记，从而避免这种无声的错误。现在，第二个序列的输出与其实际输出匹配：

默认情况下，分词器会根据你特定分词器的默认设置为你创建一个attention_mask。

>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]])
>>> output = model(input_ids, attention_mask=attention_mask)
>>> print(output.logits)
tensor([[ 0.0082, -0.2307],
        [-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)

🤗 Transformers 不会自动创建 attention_mask 来屏蔽填充标记，如果提供了填充标记，原因是：

有些模型没有填充标记。
对于某些用例，用户希望模型关注填充标记。

ValueError: 无法识别的配置类 XYZ 用于这种类型的 AutoModel

通常，我们推荐使用AutoModel类来加载预训练的模型实例。这个类可以根据配置自动推断并加载给定检查点中的正确架构。如果你在从检查点加载模型时看到这个ValueError，这意味着Auto类无法从给定检查点的配置中找到你试图加载的模型类型的映射。最常见的情况是，检查点不支持给定的任务。例如，在以下示例中你会看到这个错误，因为没有用于问答的GPT2：

>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering

>>> processor = AutoProcessor.from_pretrained("openai-community/gpt2-medium")
>>> model = AutoModelForQuestionAnswering.from_pretrained("openai-community/gpt2-medium")
ValueError: Unrecognized configuration class <class 'transformers.models.gpt2.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForQuestionAnswering.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...

< > Update on GitHub

←Community resources Interoperability with GGUF files→