► 开发者指南 / 超参数调优 / 处理KerasTuner中的失败试验

处理KerasTuner中的失败试验

作者： Haifeng Jin
创建日期： 2023/02/28
最后修改： 2023/02/28
描述： KerasTuner中的容错配置基础知识。

简介

KerasTuner程序的运行可能需要很长时间，因为每个模型的训练可能需要较长时间。我们不希望程序因为一些试验随机失败而失败。

在本指南中，我们将展示如何处理KerasTuner中的失败试验，包括：

如何在搜索过程中容忍失败的试验
如何在构建和评估模型时将试验标记为失败
如何通过引发FatalError来终止搜索

设置

!pip install keras-tuner -q

import keras
from keras import layers
import keras_tuner
import numpy as np

容忍失败的试验

我们将在初始化调优器时使用max_retries_per_trial和max_consecutive_failed_trials参数。

max_retries_per_trial控制如果试验持续失败的最大重试次数。例如，如果设置为3，则试验可能会运行4次（1次失败运行 + 3次失败重试）之后才会最终标记为失败。max_retries_per_trial的默认值为0。

max_consecutive_failed_trials控制在终止搜索之前发生多少次连续失败的试验（此处的失败试验指的是所有重试都失败的试验）。例如，如果设置为3，并且试验2、试验3和试验4都失败了，搜索将被终止。然而，如果设置为3，只有试验2、试验3、试验5和试验6失败，搜索将不会被终止，因为失败的试验并不连续。max_consecutive_failed_trials的默认值为3。

以下代码显示了这两个参数的实际应用。

我们定义了一个包含2个稠密层单元数的超参数的搜索空间。
当它们的乘积大于800时，我们为模型太大引发ValueError。

def build_model(hp):
    # 定义稠密层中单元的2个超参数
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # 定义模型
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # 当模型太大时引发错误
    num_params = model.count_params()
    if num_params > 1200:
        raise ValueError(f"模型太大！它包含 {num_params} 个参数。")
    return model

我们设置调优器如下。

我们设置max_retries_per_trial=3。
我们设置max_consecutive_failed_trials=8。
我们使用GridSearch枚举所有超参数值组合。

tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

# 使用随机数据训练模型。
tuner.search(
    x=np.random.rand(100, 20),
    y=np.random.rand(100, 1),
    validation_data=(
        np.random.rand(100, 20),
        np.random.rand(100, 1),
    ),
    epochs=10,
)

# 打印结果。
tuner.results_summary()

试验 12 完成 [00h 00m 00s]

最佳 val_loss 目前为: 0.12375041842460632
总耗时: 00h 00m 08s
结果摘要
结果在 ./untitled_project
显示10个最佳试验
目标（name="val_loss", direction="min"）

试验 0003 摘要
超参数:
units_1: 20
units_2: 10
得分: 0.12375041842460632

试验 0001 摘要
超参数:
units_1: 10
units_2: 20
得分: 0.12741881608963013

试验 0002 摘要
超参数:
units_1: 10
units_2: 30
得分: 0.13982832431793213

试验 0000 摘要
超参数:
units_1: 10
units_2: 10
得分: 0.1433391124010086

试验 0005 摘要
超参数:
units_1: 20
units_2: 30
得分: 0.14747518301010132

试验 0006 摘要
超参数:
units_1: 30
units_2: 10
得分: 0.15092280507087708

试验 0004 摘要
超参数：
units_1: 20
units_2: 20
得分: 0.21962997317314148

试验 0007 摘要
超参数：
units_1: 30
units_2: 20
回溯 (最近一次调用最后一次)：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 273, 在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 238, 在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 314, 在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 232, 在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 164, 在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 155, 在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/966577796.py", 行 19, 在 build_model
    raise ValueError(f"模型过大！它包含 {num_params} 个参数。")
ValueError: 模型过大！它包含 1271 个参数。

试验 0008 摘要
超参数：
units_1: 30
units_2: 30
回溯 (最近一次调用最后一次)：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 273, 在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 238, 在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 314, 在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 232, 在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 164, 在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 155, 在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/966577796.py", 行 19, 在 build_model
    raise ValueError(f"模型过大！它包含 {num_params} 个参数。")
ValueError: 模型过大！它包含 1591 个参数。

试验 0009 摘要
超参数：
units_1: 40
units_2: 10
回溯 (最近一次调用最后一次)：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 273, 在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 238, 在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 314, 在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 232, 在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 164, 在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 155, 在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/966577796.py", 行 19, 在 build_model
    raise ValueError(f"模型过大！它包含 {num_params} 个参数。")
ValueError: 模型过大！它包含 1261 个参数。

将试验标记为失败

当模型过大时，我们不需要重试。无论我们多少次尝试相同的超参数，它始终都太大。

我们可以设置 max_retries_per_trial=0 来做到这一点。然而，无论抛出什么错误，它都不会重试，而我们可能仍然希望对其他意外错误进行重试。有没有更好的方法来处理这种情况？

我们可以抛出 FailedTrialError 来跳过重试。每当抛出此错误时，试验将不会被重试。其他错误发生时，重试仍会进行。以下是一个示例。

def build_model(hp):
    # 定义稠密层中单位的两个超参数
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # 定义模型
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # 当模型过大时抛出错误
    num_params = model.count_params()
    if num_params > 1200:
        # 当抛出此错误时，将跳过重试。
        raise keras_tuner.errors.FailedTrialError(
            f"模型过大！它包含 {num_params} 个参数。"
        )
    return model


tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

# 使用随机数据来训练模型。
tuner.search(
    x=np.random.rand(100, 20),
    y=np.random.rand(100, 1),
    validation_data=(
        np.random.rand(100, 20),
        np.random.rand(100, 1),
    ),
    epochs=10,
)

# 打印结果。
tuner.results_summary()

试验 12 完成 [00时 00分 00秒]

目前最佳 val_loss: 0.08265472948551178
总耗时: 00时 00分 05秒
结果摘要
结果保存在 ./untitled_project
显示 10 个最佳试验
目标(name="val_loss", direction="min")

试验 0002 摘要
超参数:
units_1: 10
units_2: 30
得分: 0.08265472948551178

试验 0005 摘要
超参数:
units_1: 20
units_2: 30
得分: 0.11731438338756561

试验 0006 摘要
超参数:
units_1: 30
units_2: 10
得分: 0.13600358366966248

试验 0004 摘要
超参数:
units_1: 20
units_2: 20
得分: 0.1465979516506195

试验 0000 摘要
超参数:
units_1: 10
units_2: 10
得分: 0.15967626869678497

试验 0001 摘要
超参数:
units_1: 10
units_2: 20
得分: 0.1646396517753601

试验 0003 摘要
超参数:
units_1: 20
units_2: 10
得分: 0.1696309596300125

试验 0007 摘要
超参数:
units_1: 30
units_2: 20
回溯（最近的调用最后）：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 273, 在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 238, 在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 314, 在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 232, 在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 164, 在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 155, 在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/2463037569.py", 行 20, 在 build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: 模型太大! 它包含 1271 个参数。

试验 0008 摘要
超参数:
units_1: 30
units_2: 30
回溯（最近的调用最后）：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 273, 在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 行 238, 在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 314, 在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 232, 在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 164, 在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 行 155, 在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/2463037569.py", 行 20, 在 build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: 模型太大! 它包含 1591 个参数。

试验 0009 摘要
超参数：
units_1: 40
units_2: 10
回溯（最近的调用最后发生）：
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 第 273 行，在 _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", 第 238 行，在 _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 第 314 行，在 run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 第 232 行，在 _build_and_fit_model
    model = self._try_build(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 第 164 行，在 _try_build
    model = self._build_hypermodel(hp)
  文件 "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", 第 155 行，在 _build_hypermodel
    model = self.hypermodel.build(hp)
  文件 "/tmp/ipykernel_21713/2463037569.py", 第 20 行，在 build_model
    raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: 模型太大了！它包含 1261 个参数。

程序化终止搜索

当代码中出现错误时，我们应该立即终止搜索并修复该错误。当满足您定义的条件时，可以程序化地终止搜索。引发 FatalError（或其子类 FatalValueError、FatalTypeError 或 FatalRuntimeError）将立即终止搜索，而不考虑 max_consecutive_failed_trials 参数。

以下是一个在模型太大时终止搜索的示例。

def build_model(hp):
    # 定义密集层单位的两个超参数
    units_1 = hp.Int("units_1", 10, 40, step=10)
    units_2 = hp.Int("units_2", 10, 30, step=10)

    # 定义模型
    model = keras.Sequential(
        [
            layers.Dense(units=units_1, input_shape=(20,)),
            layers.Dense(units=units_2),
            layers.Dense(units=1),
        ]
    )
    model.compile(loss="mse")

    # 当模型过大时引发错误
    num_params = model.count_params()
    if num_params > 1200:
        # 当引发此错误时，搜索将被终止。
        raise keras_tuner.errors.FatalError(
            f"模型太大！它包含 {num_params} 个参数。"
        )
    return model


tuner = keras_tuner.GridSearch(
    hypermodel=build_model,
    objective="val_loss",
    overwrite=True,
    max_retries_per_trial=3,
    max_consecutive_failed_trials=8,
)

try:
    # 使用随机数据训练模型。
    tuner.search(
        x=np.random.rand(100, 20),
        y=np.random.rand(100, 1),
        validation_data=(
            np.random.rand(100, 20),
            np.random.rand(100, 1),
        ),
        epochs=10,
    )
except keras_tuner.errors.FatalError:
    print("搜索已终止。")

第 7 次试验完成 [00h 00m 01s]
val_loss: 0.14219732582569122

迄今为止最佳 val_loss: 0.09755773097276688
总耗时: 00h 00m 04s

搜索: 正在运行第 #8 次试验

值               |  迄今为止最佳值  |  超参数
30                |  10                |  units_1
20                |  20                |  units_2

搜索已终止。

收获

在本指南中，您将学习如何处理 KerasTuner 中的失败试验：

使用 max_retries_per_trial 指定失败试验的重试次数。
使用 max_consecutive_failed_trials 指定可以容忍的最大连续失败试验数。
引发 FailedTrialError 直接将试验标记为失败并跳过重试。
引发 FatalError、FatalValueError、FatalTypeError、FatalRuntimeError 立即终止搜索。

处理KerasTuner中的失败试验

◆ 简介

◆ 设置

◆ 容忍失败的试验

◆ 将试验标记为失败

◆ 程序化终止搜索

◆ 收获