Transformers 文档

测试

Transformers

测试

让我们来看看🤗 Transformers模型是如何测试的，以及你如何编写新的测试并改进现有的测试。

仓库中有2个测试套件：

tests — 用于通用API的测试
examples — 主要用于测试不属于API的各种应用程序的测试

变压器如何测试

一旦提交了PR，它就会通过9个CircleCi作业进行测试。每次对该PR的新提交都会重新测试。这些作业在配置文件中定义，因此如果需要，您可以在您的机器上重现相同的环境。

这些CI作业不运行@slow测试。
有3个作业由github actions运行：
- torch hub integration: 检查torch hub集成是否正常工作。
- self-hosted (push): 仅在main分支的提交上运行GPU上的快速测试。它仅在main分支的提交更新了以下文件夹中的代码时运行：src、tests、.github（以防止在添加的模型卡片、笔记本等上运行）。
- self-hosted runner: 在GPU上运行tests和examples中的正常和慢速测试：

RUN_SLOW=1 pytest tests/
RUN_SLOW=1 pytest examples/

结果可以在这里here观察到。

运行测试

选择要运行的测试

本文档详细介绍了如何运行测试。如果在阅读完所有内容后，您还需要更多详细信息，您可以在这里找到。

以下是一些最有用的运行测试的方法。

全部运行：

pytest

或者：

make test

请注意，后者被定义为：

python -m pytest -n auto --dist=loadfile -s -v ./tests/

这告诉pytest：

运行与CPU核心数量相同的测试进程（如果你没有大量的RAM，这可能会太多！）
确保来自同一文件的所有测试将由同一测试进程运行
不捕获输出
以详细模式运行

获取所有测试的列表

测试套件的所有测试：

pytest --collect-only -q

给定测试文件的所有测试：

pytest tests/test_optimization.py --collect-only -q

运行特定的测试模块

运行单个测试模块：

pytest tests/utils/test_logging.py

运行特定测试

由于大多数测试中都使用了unittest，要运行特定的子测试，你需要知道包含这些测试的unittest类的名称。例如，它可能是：

pytest tests/test_optimization.py::OptimizationTest::test_adam_w

这里：

tests/test_optimization.py - 包含测试的文件
OptimizationTest - 类的名称
test_adam_w - 特定测试函数的名称

如果文件包含多个类，您可以选择仅运行给定类的测试。例如：

pytest tests/test_optimization.py::OptimizationTest

将运行该类中的所有测试。

如前所述，您可以通过运行以下命令查看OptimizationTest类中包含的测试：

pytest tests/test_optimization.py::OptimizationTest --collect-only -q

您可以通过关键字表达式运行测试。

仅运行名称包含 adam 的测试：

pytest -k adam tests/test_optimization.py

逻辑 and 和 or 可用于指示是否应匹配所有关键字或任一关键字。not 可用于否定。

运行所有测试，除了名称包含 adam 的测试：

pytest -k "not adam" tests/test_optimization.py

你可以将这两种模式结合在一个中：

pytest -k "ada and not adam" tests/test_optimization.py

例如，要同时运行 test_adafactor 和 test_adam_w，你可以使用：

pytest -k "test_adafactor or test_adam_w" tests/test_optimization.py

请注意，我们在这里使用or，因为我们希望任何一个关键字匹配都能包含两者。

如果你想只包含同时包含两种模式的测试，应使用and：

pytest -k "test and ada" tests/test_optimization.py

运行加速测试

有时你需要在你的模型上运行accelerate测试。为此，你只需在命令中添加-m accelerate_tests，例如，如果你想在OPT上运行这些测试，可以运行：

RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py

运行文档测试

为了测试文档示例是否正确，您应该检查doctests是否通过。例如，让我们使用WhisperModel.forward的文档字符串

r"""
Returns:

Example:
    ```python
    >>> import torch
    >>> from transformers import WhisperModel, WhisperFeatureExtractor
    >>> from datasets import load_dataset

    >>> model = WhisperModel.from_pretrained("openai/whisper-base")
    >>> feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-base")
    >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
    >>> inputs = feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")
    >>> input_features = inputs.input_features
    >>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id
    >>> last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state
    >>> list(last_hidden_state.shape)
    [1, 2, 512]
    ```"""

只需运行以下行即可自动测试所需文件中的每个文档字符串示例：

pytest --doctest-modules <path_to_file_or_dir>

如果文件具有markdown扩展名，您应该添加--doctest-glob="*.md"参数。

仅运行修改过的测试

你可以通过使用pytest-picked来运行与未暂存文件或当前分支（根据Git）相关的测试。这是一个快速测试你的更改是否没有破坏任何东西的好方法，因为它不会运行与你未触及的文件相关的测试。

pip install pytest-picked

pytest --picked

所有测试将从已修改但尚未提交的文件和文件夹中运行。

在源代码修改时自动重新运行失败的测试

pytest-xdist 提供了一个非常有用的功能，可以检测所有失败的测试，然后等待你修改文件并持续重新运行那些失败的测试，直到它们通过为止。这样你在修复后就不需要重新启动 pytest。这个过程会重复进行，直到所有测试都通过，之后会再次执行完整的运行。

pip install pytest-xdist

进入模式：pytest -f 或 pytest --looponfail

通过查看looponfailroots根目录及其所有内容（递归地）来检测文件更改。如果此值的默认值不适合您，您可以通过在setup.cfg中设置配置选项来更改它：

[tool:pytest]
looponfailroots = transformers tests

或 pytest.ini/tox.ini 文件：

[pytest]
looponfailroots = transformers tests

这将导致仅在相对于ini文件目录的相应目录中查找文件更改。

pytest-watch 是这个功能的另一种实现。

跳过测试模块

如果你想运行所有测试模块，除了少数几个，你可以通过提供一个明确的测试列表来排除它们。例如，要运行除了test_modeling_*.py测试之外的所有测试：

pytest *ls -1 tests/*py | grep -v test_modeling*

清除状态

CI构建和当隔离很重要时（相对于速度），应清除缓存：

pytest --cache-clear tests

并行运行测试

如前所述，make test 通过 pytest-xdist 插件并行运行测试（-n X 参数，例如 -n 2 表示运行2个并行任务）。

pytest-xdist 的 --dist= 选项允许控制测试的分组方式。--dist=loadfile 将位于同一文件中的测试放在同一个进程中。

由于执行的测试顺序不同且不可预测，如果使用pytest-xdist运行测试套件时出现失败（意味着我们有一些未检测到的耦合测试），请使用pytest-replay以相同的顺序重放测试，这应该有助于将失败的序列减少到最小。

测试顺序和重复

最好多次重复测试，按顺序、随机或成组进行，以检测任何潜在的相互依赖性和与状态相关的错误（拆卸）。简单的多次重复也有助于发现一些由深度学习的随机性揭示的问题。

重复测试

pytest-flakefinder:

pip install pytest-flakefinder

然后多次运行每个测试（默认50次）：

pytest --flake-finder --flake-runs=5 tests/test_failing_test.py

此插件不适用于来自 pytest-xdist 的 -n 标志。

还有另一个插件 pytest-repeat，但它不适用于 unittest。

以随机顺序运行测试

pip install pytest-random-order

重要提示：pytest-random-order 的存在会自动随机化测试，不需要更改配置或命令行选项。

如前所述，这允许检测耦合测试——其中一个测试的状态会影响另一个测试的状态。当安装pytest-random-order时，它将打印出该会话使用的随机种子，例如：

pytest tests
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663

因此，如果给定的特定序列失败，您可以通过添加确切的种子来重现它，例如：

pytest --random-order-seed=573663
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663

只有在你使用完全相同的测试列表（或根本没有列表）时，它才会重现完全相同的顺序。一旦你开始手动缩小列表范围，你就不能再依赖种子，而是必须按照它们失败的顺序手动列出它们，并告诉 pytest 不要随机化它们，而是使用 --random-order-bucket=none，例如：

pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py

要禁用所有测试的随机顺序：

pytest --random-order-bucket=none

默认情况下，--random-order-bucket=module 是隐含的，它将在模块级别上打乱文件的顺序。它也可以在 class、package、global 和 none 级别上进行打乱。有关完整详情，请参阅其文档。

另一个随机化的替代方案是：pytest-randomly。这个模块具有非常相似的功能/接口，但它没有pytest-random-order中可用的桶模式。它也有同样的问题，一旦安装就会强制自己。

外观和感觉变化

pytest-sugar

pytest-sugar 是一个插件，它改善了外观和感觉，添加了一个进度条，并立即显示失败的测试和断言。它在安装后会自动激活。

pip install pytest-sugar

要在没有它的情况下运行测试，请运行：

pytest -p no:sugar

或卸载它。

报告每个子测试名称及其进度

对于通过 pytest 进行的单个或一组测试（在 pip install pytest-pspec 之后）：

pytest --pspec tests/test_optimization.py

立即显示失败的测试

pytest-instafail 立即显示失败和错误，而不是等到测试会话结束。

pip install pytest-instafail

pytest --instafail

使用GPU还是不使用GPU

在支持GPU的设置中，要在仅CPU模式下进行测试，对于CUDA GPU，请添加CUDA_VISIBLE_DEVICES=""：

CUDA_VISIBLE_DEVICES="" pytest tests/utils/test_logging.py

或者如果你有多个GPU，你可以通过pytest指定使用哪一个。例如，如果你有GPU 0和1，你可以运行以下命令来仅使用第二个GPU：

CUDA_VISIBLE_DEVICES="1" pytest tests/utils/test_logging.py

对于Intel GPU，在上述示例中使用ZE_AFFINITY_MASK而不是CUDA_VISIBLE_DEVICES。

当您想在不同的GPU上运行不同的任务时，这非常方便。

一些测试必须在仅CPU上运行，其他测试可以在CPU、GPU或TPU上运行，还有一些测试需要在多个GPU上运行。以下跳过装饰器用于设置测试在CPU/GPU/XPU/TPU方面的要求：

require_torch - 此测试仅在torch下运行
require_torch_gpu - 与 require_torch 相同，但还需要至少1个GPU
require_torch_multi_gpu - 与 require_torch 类似，但需要至少2个GPU
require_torch_non_multi_gpu - 与 require_torch 相同，但要求 0 或 1 个 GPU
require_torch_up_to_2_gpus - 与 require_torch 类似，但额外要求 0、1 或 2 个 GPU
require_torch_xla - 与 require_torch 相同，但额外要求至少1个TPU

让我们在下表中描述GPU的要求：

GPU数量	装饰器
`>= 0`	`@require_torch`
`>= 1`	`@require_torch_gpu`
`>= 2`	`@require_torch_multi_gpu`
`< 2`	`@require_torch_non_multi_gpu`
`< 3`	`@require_torch_up_to_2_gpus`

例如，这里有一个测试，只有在有2个或更多GPU可用并且安装了pytorch时才能运行：

@require_torch_multi_gpu
def test_example_with_multi_gpu():

如果测试需要 tensorflow，请使用 require_tf 装饰器。例如：

@require_tf
def test_tf_thing_with_tensorflow():

这些装饰器可以堆叠使用。例如，如果一个测试速度较慢并且需要在pytorch下至少一个GPU，以下是设置方法：

@require_torch_gpu
@slow
def test_example_slow_on_gpu():

一些装饰器如 @parametrized 会重写测试名称，因此 @require_* 跳过装饰器必须列在最后才能正常工作。以下是正确用法的示例：

@parameterized.expand(...)
@require_torch_multi_gpu
def test_integration_foo():

这个顺序问题在@pytest.mark.parametrize中不存在，你可以把它放在最前面或最后面，它仍然可以工作。但它只适用于非单元测试。

测试内部：

有多少个GPU可用：

from transformers.testing_utils import get_gpu_count

n_gpu = get_gpu_count()  # works with torch and tf

使用特定的PyTorch后端或设备进行测试

要在特定的torch设备上运行测试套件，请添加TRANSFORMERS_TEST_DEVICE="$device"，其中$device是目标后端。例如，仅在CPU上测试：

TRANSFORMERS_TEST_DEVICE="cpu" pytest tests/utils/test_logging.py

此变量对于测试自定义或不常见的PyTorch后端（如mps、xpu或npu）非常有用。它还可以通过指定特定的GPU或在仅CPU模式下测试来实现与CUDA_VISIBLE_DEVICES相同的效果。

某些设备在首次导入torch后需要额外的导入。这可以通过环境变量TRANSFORMERS_TEST_BACKEND来指定：

TRANSFORMERS_TEST_BACKEND="torch_npu" pytest tests/utils/test_logging.py

替代的后端可能还需要替换设备特定的函数。例如，torch.cuda.manual_seed 可能需要替换为设备特定的种子设置器，如 torch.npu.manual_seed 或 torch.xpu.manual_seed，以正确设置设备上的随机种子。要在运行测试套件时指定具有后端特定设备函数的新后端，请创建一个格式如下的 Python 设备规范文件 spec.py：

import torch
import torch_npu # for xpu, replace it with `import intel_extension_for_pytorch`
# !! Further additional imports can be added here !!

# Specify the device name (eg. 'cuda', 'cpu', 'npu', 'xpu', 'mps')
DEVICE_NAME = 'npu'

# Specify device-specific backends to dispatch to.
# If not specified, will fallback to 'default' in 'testing_utils.py`
MANUAL_SEED_FN = torch.npu.manual_seed
EMPTY_CACHE_FN = torch.npu.empty_cache
DEVICE_COUNT_FN = torch.npu.device_count

此格式还允许指定所需的任何额外导入。要使用此文件替换测试套件中的等效方法，请将环境变量 TRANSFORMERS_TEST_DEVICE_SPEC 设置为规范文件的路径，例如 TRANSFORMERS_TEST_DEVICE_SPEC=spec.py。

目前，仅支持MANUAL_SEED_FN、EMPTY_CACHE_FN和DEVICE_COUNT_FN用于设备特定的调度。

分布式训练

pytest 无法直接处理分布式训练。如果尝试这样做，子进程不会正确执行，最终会认为它们是 pytest 并开始循环运行测试套件。然而，如果生成一个正常进程，然后生成多个工作进程并管理 IO 管道，则可以正常工作。

以下是一些使用它的测试：

要直接跳转到执行点，请在这些测试中搜索execute_subprocess_async调用。

您至少需要2个GPU才能看到这些测试的实际效果：

CUDA_VISIBLE_DEVICES=0,1 RUN_SLOW=1 pytest -sv tests/test_trainer_distributed.py

输出捕获

在测试执行期间，任何发送到 stdout 和 stderr 的输出都会被捕获。如果测试或设置方法失败，其相应的捕获输出通常会与失败的回溯一起显示。

要禁用输出捕获并正常获取 stdout 和 stderr，请使用 -s 或 --capture=no：

pytest -s tests/utils/test_logging.py

将测试结果发送到JUnit格式输出：

pytest tests --junitxml=result.xml

颜色控制

没有颜色（例如，白色背景上的黄色不可读）：

pytest --color=no tests/utils/test_logging.py

将测试报告发送到在线粘贴服务

为每个测试失败创建URL：

pytest --pastebin=failed tests/utils/test_logging.py

这将提交测试运行信息到远程Paste服务，并为每个失败提供URL。您可以像往常一样选择测试，或者添加例如-x，如果您只想发送一个特定的失败。

为整个测试会话日志创建URL：

pytest --pastebin=all tests/utils/test_logging.py

编写测试

🤗 transformers 测试基于 unittest，但由 pytest 运行，因此大多数情况下可以使用这两个系统的功能。

你可以阅读这里了解哪些功能被支持，但重要的是要记住大多数pytest的fixtures不起作用。参数化也不起作用，但我们使用模块parameterized，它的工作方式类似。

参数化

通常，需要多次运行相同的测试，但使用不同的参数。这可以在测试内部完成，但这样就无法仅针对一组参数运行该测试。

# test_this1.py
import unittest
from parameterized import parameterized


class TestMathUnitTest(unittest.TestCase):
    @parameterized.expand(
        [
            ("negative", -1.5, -2.0),
            ("integer", 1, 1.0),
            ("large fraction", 1.6, 1),
        ]
    )
    def test_floor(self, name, input, expected):
        assert_equal(math.floor(input), expected)

现在，默认情况下，这个测试将运行3次，每次都会将test_floor的最后3个参数赋值为参数列表中对应的参数。

你可以只运行negative和integer参数集：

pytest -k "negative and integer" tests/test_mytest.py

或除了negative子测试之外的所有测试，使用：

pytest -k "not negative" tests/test_mytest.py

除了使用刚刚提到的-k过滤器外，您还可以找到每个子测试的确切名称，并使用它们的准确名称运行任何或所有子测试。

pytest test_this1.py --collect-only -q

并且它将列出：

test_this1.py::TestMathUnitTest::test_floor_0_negative
test_this1.py::TestMathUnitTest::test_floor_1_integer
test_this1.py::TestMathUnitTest::test_floor_2_large_fraction

所以现在你可以只运行2个特定的子测试：

pytest test_this1.py::TestMathUnitTest::test_floor_0_negative  test_this1.py::TestMathUnitTest::test_floor_1_integer

模块 parameterized 已经在 transformers 的开发依赖中，适用于 unittests 和 pytest 测试。

然而，如果测试不是unittest，你可以使用pytest.mark.parametrize（或者你可能会在一些现有的测试中看到它被使用，主要是在examples下）。

这是相同的示例，这次使用了 pytest 的 parametrize 标记：

# test_this2.py
import pytest


@pytest.mark.parametrize(
    "name, input, expected",
    [
        ("negative", -1.5, -2.0),
        ("integer", 1, 1.0),
        ("large fraction", 1.6, 1),
    ],
)
def test_floor(name, input, expected):
    assert_equal(math.floor(input), expected)

与parameterized相同，使用pytest.mark.parametrize，如果-k过滤器不起作用，您可以精细控制运行哪些子测试。不过，这个参数化函数为子测试创建了一组稍微不同的名称。以下是它们的样子：

pytest test_this2.py --collect-only -q

它将列出：

test_this2.py::test_floor[integer-1-1.0]
test_this2.py::test_floor[negative--1.5--2.0]
test_this2.py::test_floor[large fraction-1.6-1]

所以现在你可以只运行特定的测试：

pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0]

如前面的例子所示。

文件和目录

在测试中，我们经常需要知道事物相对于当前测试文件的位置，这并不简单，因为测试可能从多个目录调用，或者可能位于不同深度的子目录中。一个辅助类 transformers.test_utils.TestCasePlus 通过整理所有基本路径并提供了方便的访问器来解决这个问题：

pathlib 对象（全部已解析）：
- test_file_path - the current test file path, i.e. __file__
- test_file_dir - the directory containing the current test file
- tests_dir - the directory of the tests test suite
- examples_dir - the directory of the examples test suite
- repo_root_dir - the directory of the repository
- src_dir - the directory of src (i.e. where the transformers sub-dir resides)
字符串化的路径---与上述相同，但这些返回路径作为字符串，而不是pathlib对象：
- test_file_path_str
- test_file_dir_str
- tests_dir_str
- examples_dir_str
- repo_root_dir_str
- src_dir_str

要开始使用这些功能，您只需确保测试位于transformers.test_utils.TestCasePlus的子类中。例如：

from transformers.testing_utils import TestCasePlus


class PathExampleTest(TestCasePlus):
    def test_something_involving_local_locations(self):
        data_dir = self.tests_dir / "fixtures/tests_samples/wmt_en_ro"

如果你不需要通过pathlib操作路径，或者你只需要一个字符串形式的路径，你可以随时在pathlib对象上调用str()，或者使用以_str结尾的访问器。例如：

from transformers.testing_utils import TestCasePlus


class PathExampleTest(TestCasePlus):
    def test_something_involving_stringified_locations(self):
        examples_dir = self.examples_dir_str

临时文件和目录

使用唯一的临时文件和目录对于并行测试运行至关重要，这样测试就不会覆盖彼此的数据。此外，我们希望在每个测试结束时删除创建的临时文件和目录。因此，使用像tempfile这样的包来满足这些需求是必不可少的。

然而，在调试测试时，您需要能够查看临时文件或目录中的内容，并且您希望知道它的确切路径，而不是在每次重新运行测试时随机化。

一个辅助类 transformers.test_utils.TestCasePlus 最适合用于此类目的。它是 unittest.TestCase 的子类，因此我们可以在测试模块中轻松继承它。

以下是其使用示例：

from transformers.testing_utils import TestCasePlus


class ExamplesTests(TestCasePlus):
    def test_whatever(self):
        tmp_dir = self.get_auto_remove_tmp_dir()

这段代码创建了一个唯一的临时目录，并将tmp_dir设置为其位置。

def test_whatever(self):
    tmp_dir = self.get_auto_remove_tmp_dir()

tmp_dir 将包含创建的临时目录的路径。它将在测试结束时自动删除。

创建一个我选择的临时目录，确保在测试开始前它是空的，并且在测试结束后不要清空它。

def test_whatever(self):
    tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

这对于调试非常有用，当您想要监控特定目录并确保之前的测试没有留下任何数据时。

您可以通过直接覆盖before和after参数来覆盖默认行为，从而导致以下行为之一：
- before=True: the temporary dir will always be cleared at the beginning of the test.
- before=False: if the temporary dir already existed, any existing files will remain there.
- after=True: the temporary dir will always be deleted at the end of the test.
- after=False: the temporary dir will always be left intact at the end of the test.

为了安全地运行相当于rm -r的操作，如果使用了显式的tmp_dir，则只允许项目仓库检出中的子目录，以免意外删除/tmp或文件系统中其他重要部分。即请始终传递以./开头的路径。

每个测试可以注册多个临时目录，除非另有要求，否则它们都会自动删除。

临时 sys.path 覆盖

如果你需要临时覆盖sys.path以便从另一个测试中导入，例如，你可以使用ExtendSysPath上下文管理器。示例：

import os
from transformers.testing_utils import ExtendSysPath

bindir = os.path.abspath(os.path.dirname(__file__))
with ExtendSysPath(f"{bindir}/.."):
    from test_trainer import TrainerIntegrationCommon  # noqa

跳过测试

这在发现错误并编写了新测试但尚未修复错误时非常有用。为了能够将其提交到主仓库，我们需要确保它在make test期间被跳过。

方法：

一个跳过意味着你期望你的测试只有在满足某些条件时才会通过，否则pytest应该完全跳过运行该测试。常见的例子包括在非Windows平台上跳过仅适用于Windows的测试，或者跳过依赖于当前不可用的外部资源（例如数据库）的测试。
一个xfail意味着你预期某个测试会因为某些原因而失败。一个常见的例子是对尚未实现的功能的测试，或者是对尚未修复的错误的测试。当一个测试尽管预期会失败却通过了（标记为pytest.mark.xfail），它就是一个xpass，并且会在测试总结中报告。

两者之间的一个重要区别是skip不会运行测试，而xfail会运行。因此，如果存在问题的代码会导致某些不良状态并影响其他测试，请不要使用xfail。

实现

以下是如何无条件跳过整个测试：

@unittest.skip(reason="this bug needs to be fixed")
def test_feature_x():

或者通过 pytest:

@pytest.mark.skip(reason="this bug needs to be fixed")

或者使用 xfail 方法：

@pytest.mark.xfail
def test_feature_x():

以下是如何根据测试中的内部检查跳过测试的方法：

def test_feature_x():
    if not has_something():
        pytest.skip("unsupported configuration")

或整个模块：

import pytest

if not pytest.config.getoption("--custom-flag"):
    pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True)

或者使用 xfail 方法：

def test_feature_x():
    pytest.xfail("expected to fail until bug XYZ is fixed")

以下是如何在模块中跳过所有测试，如果缺少某些导入：

docutils = pytest.importorskip("docutils", minversion="0.3")

根据条件跳过测试：

@pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher")
def test_feature_x():

或者：

@unittest.skipIf(torch_device == "cpu", "Can't do half precision")
def test_feature_x():

或者跳过整个模块：

@pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows")
class TestClass():
    def test_feature_x(self):

更多详情、示例和方法请参见这里。

慢速测试

测试库不断增长，其中一些测试需要几分钟才能运行，因此我们无法承受在CI上等待测试套件完成一个小时。因此，除了一些必要的测试外，慢速测试应标记为如下示例：

from transformers.testing_utils import slow
@slow
def test_integration_foo():

一旦测试被标记为@slow，要运行此类测试，请设置RUN_SLOW=1环境变量，例如：

RUN_SLOW=1 pytest tests

一些装饰器如@parameterized会重写测试名称，因此@slow和其他跳过装饰器@require_*必须列在最后才能正确工作。以下是一个正确用法的示例：

@parameterized.expand(...)
@slow
def test_integration_foo():

如本文档开头所述，慢速测试会在预定基础上运行，而不是在PR的CI检查中运行。因此，在PR提交过程中可能会遗漏一些问题并合并。这些问题将在下一次预定的CI作业中被捕获。但这也意味着在提交PR之前，在您的机器上运行慢速测试非常重要。

以下是一个粗略的决策机制，用于选择哪些测试应被标记为慢速：

如果测试集中在库的某个内部组件（例如，建模文件、分词文件、管道），那么我们应在非慢速测试套件中运行该测试。如果测试集中在库的其他方面，例如文档或示例，那么我们应在慢速测试套件中运行这些测试。然后，为了完善这种方法，我们应该有例外情况：

所有需要下载大量权重或大于约50MB的数据集的测试（例如，模型或分词器集成测试，管道集成测试）应设置为慢速。如果您正在添加一个新模型，您应该为其创建一个微型版本（带有随机权重）并上传到中心以进行集成测试。这将在以下段落中讨论。
所有需要进行训练且未特别优化以加快速度的测试应设置为慢速。
如果某些本应不慢的测试极其缓慢，我们可以引入异常，并将它们设置为 @slow。自动建模测试，即保存和加载大文件到磁盘的测试，是一个很好的例子，这些测试被标记为 @slow。
如果测试在CI上完成时间少于1秒（包括下载，如果有的话），那么它应该是一个普通测试。

总的来说，所有非慢速测试需要完全覆盖不同的内部结构，同时保持快速。例如，通过使用特别创建的具有随机权重的微小模型进行测试，可以实现显著的覆盖率。这些模型具有非常少的层数（例如，2层）、词汇量（例如，1000）等。然后，@slow 测试可以使用大型慢速模型进行定性测试。要查看这些的使用情况，只需查找带有 tiny 的模型：

grep tiny tests examples

这是一个创建小型模型的脚本示例 stas/tiny-wmt19-en-de。你可以轻松地根据你的特定模型架构进行调整。

如果例如下载一个巨大模型的开销很大，很容易错误地测量运行时间，但如果你在本地测试，下载的文件会被缓存，因此不会测量下载时间。因此，请检查CI日志中的执行速度报告（pytest --durations=0 tests的输出）。

该报告还用于查找未标记为慢速的异常值，或需要重写以加快速度的测试。如果您注意到测试套件在CI上开始变慢，此报告的顶部列表将显示最慢的测试。

测试stdout/stderr输出

为了测试写入stdout和/或stderr的函数，测试可以使用pytest的capsys系统访问这些流。以下是实现方法：

import sys


def print_to_stdout(s):
    print(s)


def print_to_stderr(s):
    sys.stderr.write(s)


def test_result_and_stdout(capsys):
    msg = "Hello"
    print_to_stdout(msg)
    print_to_stderr(msg)
    out, err = capsys.readouterr()  # consume the captured output streams
    # optional: if you want to replay the consumed streams:
    sys.stdout.write(out)
    sys.stderr.write(err)
    # test:
    assert msg in out
    assert msg in err

当然，大多数情况下，stderr 会作为异常的一部分出现，因此在这种情况下必须使用 try/except：

def raise_exception(msg):
    raise ValueError(msg)


def test_something_exception():
    msg = "Not a good value"
    error = ""
    try:
        raise_exception(msg)
    except Exception as e:
        error = str(e)
        assert msg in error, f"{msg} is in the exception:\n{error}"

另一种捕获标准输出的方法是通过 contextlib.redirect_stdout：

from io import StringIO
from contextlib import redirect_stdout


def print_to_stdout(s):
    print(s)


def test_result_and_stdout():
    msg = "Hello"
    buffer = StringIO()
    with redirect_stdout(buffer):
        print_to_stdout(msg)
    out = buffer.getvalue()
    # optional: if you want to replay the consumed streams:
    sys.stdout.write(out)
    # test:
    assert msg in out

捕获标准输出的一个重要潜在问题是，它可能包含\r字符，这些字符在正常的print中会重置到目前为止打印的所有内容。使用pytest没有问题，但在使用pytest -s时，这些字符会被包含在缓冲区中，因此为了能够在有和没有-s的情况下运行测试，您必须对捕获的输出进行额外的清理，使用re.sub(r'~.*\r', '', buf, 0, re.M)。

但是，我们有一个辅助的上下文管理器包装器来自动处理这一切，无论它是否包含一些\r，所以它很简单：

from transformers.testing_utils import CaptureStdout

with CaptureStdout() as cs:
    function_that_writes_to_stdout()
print(cs.out)

这是一个完整的测试示例：

from transformers.testing_utils import CaptureStdout

msg = "Secret message\r"
final = "Hello World"
with CaptureStdout() as cs:
    print(msg + final)
assert cs.out == final + "\n", f"captured: {cs.out}, expecting {final}"

如果你想捕获 stderr，请使用 CaptureStderr 类：

from transformers.testing_utils import CaptureStderr

with CaptureStderr() as cs:
    function_that_writes_to_stderr()
print(cs.err)

如果你需要同时捕获两个流，请使用父类 CaptureStd：

from transformers.testing_utils import CaptureStd

with CaptureStd() as cs:
    function_that_writes_to_stdout_and_stderr()
print(cs.err, cs.out)

此外，为了帮助调试测试问题，默认情况下，这些上下文管理器在退出上下文时自动重放捕获的流。

捕获日志流

如果你需要验证日志记录器的输出，你可以使用 CaptureLogger：

from transformers import logging
from transformers.testing_utils import CaptureLogger

msg = "Testing 1, 2, 3"
logging.set_verbosity_info()
logger = logging.get_logger("transformers.models.bart.tokenization_bart")
with CaptureLogger(logger) as cl:
    logger.info(msg)
assert cl.out, msg + "\n"

使用环境变量进行测试

如果你想测试特定测试的环境变量的影响，你可以使用一个辅助装饰器 transformers.testing_utils.mockenv

from transformers.testing_utils import mockenv


class HfArgumentParserTest(unittest.TestCase):
    @mockenv(TRANSFORMERS_VERBOSITY="error")
    def test_env_override(self):
        env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)

有时需要调用外部程序，这需要在os.environ中设置PYTHONPATH以包含多个本地路径。一个辅助类transformers.test_utils.TestCasePlus可以帮助解决这个问题：

from transformers.testing_utils import TestCasePlus


class EnvExampleTest(TestCasePlus):
    def test_external_prog(self):
        env = self.get_env()
        # now call the external program, passing `env` to it

根据测试文件是否在tests测试套件或examples下，它将正确设置env[PYTHONPATH]以包含这两个目录之一，并且还包括src目录，以确保测试是针对当前仓库进行的，最后如果测试调用之前已经设置了env[PYTHONPATH]，则保留其原有设置。

这个辅助方法创建了一个os.environ对象的副本，因此原始对象保持不变。

获取可重复的结果

在某些情况下，您可能希望在测试中去除随机性。为了获得相同且可重复的结果集，您需要固定种子：

seed = 42

# python RNG
import random

random.seed(seed)

# pytorch RNGs
import torch

torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

# numpy RNG
import numpy as np

np.random.seed(seed)

# tf RNG
import tensorflow as tf 

tf.random.set_seed(seed)

调试测试

要在警告点启动调试器，请执行以下操作：

pytest tests/utils/test_logging.py -W error::UserWarning --pdb

使用 GitHub Actions 工作流

要触发自推送工作流CI作业，您必须：

在transformers源上创建一个新分支（不是fork！）。
分支名称必须以ci_或ci-开头（main也会触发，但我们不能在main上进行PR）。它也只对特定路径触发 - 如果自本文档编写以来定义发生了变化，您可以在这里的push:下找到最新的定义。
从这个分支创建一个PR。
然后你可以看到任务出现在这里。如果有积压，它可能不会立即运行。

测试实验性CI功能

测试CI功能可能会带来潜在问题，因为它可能会干扰正常的CI运行。因此，如果要添加新的CI功能，应按照以下步骤进行。

创建一个新的专用作业来测试需要测试的内容
新任务必须始终成功，以便给我们一个绿色的✓（详情见下文）。
让它运行几天，以观察各种不同的PR类型在其上运行（用户分叉分支、非分叉分支、源自github.com UI直接文件编辑的分支、各种强制推送等 - 有很多种），同时监控实验性作业的日志（不是整体作业的绿色状态，因为它特意始终保持绿色）
当一切都很稳固时，将新更改合并到现有作业中。

这样在CI功能本身上的实验不会干扰正常的工作流程。

现在，我们如何在新CI功能开发过程中使工作始终成功？

一些CI工具，如TravisCI支持忽略步骤失败，并将整体作业报告为成功，但截至撰写本文时，CircleCI和Github Actions不支持此功能。

因此可以使用以下解决方法：

set +euo pipefail 在运行命令的开头使用，以抑制bash脚本中的大多数潜在失败。
最后一个命令必须成功：echo "done" 或者仅仅 true 就可以

这是一个示例：

- run:
    name: run CI experiment
    command: |
        set +euo pipefail
        echo "setting run-all-despite-any-errors-mode"
        this_command_will_fail
        echo "but bash continues to run"
        # emulate another failure
        false
        # but the last command must be a success
        echo "during experiment do not remove: reporting success to CI, even if there were failures"

对于简单的命令，你也可以这样做：

cmd_that_may_fail || true

当然，一旦对结果满意，将实验步骤或作业与其余的正常作业集成，同时移除set +euo pipefail或您可能添加的任何其他内容，以确保实验作业不会干扰正常的CI功能。

如果我们能够为实验步骤设置类似allow-failure的东西，并让它失败而不影响PR的整体状态，整个过程会容易得多。但正如前面提到的，CircleCI和Github Actions目前不支持这一点。

你可以为这个功能投票，并在这些CI特定的线程中查看它的状态：

DeepSpeed 集成

对于涉及DeepSpeed集成的PR，请记住我们的CircleCI PR CI设置没有GPU。需要GPU的测试在另一个CI夜间运行。这意味着如果你在PR中获得了通过的CI报告，这并不意味着DeepSpeed测试通过了。

运行DeepSpeed测试：

RUN_SLOW=1 pytest tests/deepspeed/test_deepspeed.py

对建模或PyTorch示例代码的任何更改都需要同时运行模型库测试。

RUN_SLOW=1 pytest tests/deepspeed

< > Update on GitHub

←How to add a pipeline to 🤗 Transformers? Checks on a Pull Request→