超参数优化

hyper_parameter_optimizer.py 示例脚本展示了超参数优化（HPO），这是通过使用ClearML自动化的。

设置优化搜索策略

优化需要一个搜索策略，以及一个搜索策略优化器类来实现该策略。

可以使用以下搜索策略：

Optuna 超参数优化 - automation.optuna.OptimizerOptuna。有关 Optuna 的更多信息，请参阅 Optuna 文档。
BOHB - automation.hpbandster.OptimizerBOHB.

BOHB通过将Hyperband搜索的速度与贝叶斯优化的指导和收敛保证相结合，在大规模上执行稳健且高效的超参数优化。

ClearML 使用 HpBandSter 的 bohb.py 实现了 BOHB 自动化。有关 HpBandSter BOHB 的更多信息，请参阅 HpBandSter 文档。
超参数策略的随机均匀采样 - automation.RandomSearch
每个超参数组合的完整网格采样策略 - automation.GridSearch。
自定义 - 使用自定义类并从ClearML自动化基础策略类继承，SearchStrategy

选择的搜索策略类将稍后传递给automation.HyperParameterOptimizer对象。

示例代码尝试导入OptimizerOptuna作为搜索策略。如果未安装clearml.automation.optuna，则尝试导入OptimizerBOHB。如果未安装clearml.automation.hpbandster，则使用RandomSearch作为搜索策略。

try:
    from clearml.automation.optuna import OptimizerOptuna  # noqa
    aSearchStrategy = OptimizerOptuna
except ImportError as ex:
    try:
        from clearml.automation.hpbandster import OptimizerBOHB  # noqa
        aSearchStrategy = OptimizerBOHB
    except ImportError as ex:
        logging.getLogger().warning(
            'Apologies, it seems you do not have \'optuna\' or \'hpbandster\' installed, '
            'we will be using RandomSearch strategy instead')
        aSearchStrategy = RandomSearch

定义一个回调函数

当优化开始时，会提供一个回调函数，返回表现最佳的超参数集。在脚本中，job_complete_callback 函数返回 top_performance_job_id 的 ID。

def job_complete_callback(
    job_id,                 # type: str
    objective_value,        # type: float
    objective_iteration,    # type: int
    job_parameters,         # type: dict
    top_performance_job_id  # type: str
):
    print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
    if job_id == top_performance_job_id:
        print('WOOT WOOT we broke the record! Objective reached {}'.format(objective_value))

初始化优化任务

初始化任务，当代码运行时，该任务将被存储在ClearML服务器中。代码至少运行一次后，它可以被复现和调优。

将任务类型设置为optimizer，并在每次优化器运行时创建一个新的实验（和任务对象）（reuse_last_task_id=False）。

当代码运行时，它会在超参数优化项目中创建一个名为自动超参数优化的实验，该实验可以在ClearML Web UI中查看。

# Connecting CLEARML
task = Task.init(
   project_name='Hyper-Parameter Optimization',
   task_name='Automatic Hyper-Parameter Optimization',
   task_type=Task.TaskTypes.optimizer,
   reuse_last_task_id=False
)

设置参数

创建一个包含要优化的任务ID的参数字典，以及一个布尔值，指示优化器是否将作为服务运行，请参阅作为服务运行。

在这个例子中，一个名为Keras HP optimization base的实验正在进行优化。该实验必须至少运行一次，以便它存储在ClearML服务器中，因此可以被克隆。

由于参数字典与任务相关联，代码运行一次后，可以更改template_task_id以优化不同的实验。

# experiment template to optimize in the hyperparameter optimization
args = {
    'template_task_id': None,
    'run_as_service': False,
}
args = task.connect(args)
    
# Get the template task experiment that we want to optimize
if not args['template_task_id']:
    args['template_task_id'] = Task.get_task(
        project_name='examples', task_name='Keras HP optimization base').id

创建优化器对象

初始化一个automation.HyperParameterOptimizer对象，设置以下优化参数：

要优化的ClearML任务的ID。此任务将被克隆，每个克隆将采样一组不同的超参数值：

an_optimizer = HyperParameterOptimizer(
    # This is the experiment we want to optimize
    base_task_id=args['template_task_id'],

要采样的超参数范围，使用automation.UniformIntegerParameterRange 和automation.DiscreteParameterRange将它们实例化为ClearML自动化对象：

    hyper_parameters=[
        UniformIntegerParameterRange('layer_1', min_value=128, max_value=512, step_size=128),
        UniformIntegerParameterRange('layer_2', min_value=128, max_value=512, step_size=128),
        DiscreteParameterRange('batch_size', values=[96, 128, 160]),
        DiscreteParameterRange('epochs', values=[30]),
        ],

要优化的指标和优化目标：
```
    objective_metric_title='val_acc',
    objective_metric_series='val_acc',
    objective_metric_sign='max',
```
Multi-objective Optimization
如果您正在使用Optuna框架（请参阅设置优化搜索策略），您可以列出多个优化目标。这样做时，请确保objective_metric_title、objective_metric_series和objective_metric_sign列表的长度相同。每个标题将与其相应的系列和符号匹配。
例如，下面的代码设置了两个目标：最小化validation/loss指标和最大化validation/accuracy指标：
objective_metric_title=["validation", "validation"] objective_metric_series=["loss", "accuracy"] objective_metric_sign=["min", "max"]
并发任务数：
```
    max_number_of_concurrent_tasks=2,
```
优化策略（参见设置优化搜索策略）：
```
    optimizer_class=aSearchStrategy,
```
用于远程执行的队列。如果优化器作为服务运行，则此设置将被覆盖。
```
    execution_queue='1xGPU',
```

剩余参数，包括每个任务的时间限制（分钟）、检查优化的周期（分钟）、启动的最大作业数量、每个任务的最小和最大迭代次数：

    # Optional: Limit the execution time of a single experiment, in minutes.
    # (this is optional, and if using OptimizerBOHB, it is ignored)
    time_limit_per_job=10.,
    # Check the experiments every 6 seconds is way too often, we should probably set it to 5 min,
    # assuming a single experiment is usually hours...
    pool_period_min=0.1,
    # set the maximum number of jobs to launch for the optimization, default (None) unlimited
    # If OptimizerBOHB is used, it defined the maximum budget in terms of full jobs
    # basically the cumulative number of iterations will not exceed total_max_jobs * max_iteration_per_job
    total_max_jobs=10,
    # This is only applicable for OptimizerBOHB and ignore by the rest
    # set the minimum number of iterations for an experiment, before early stopping
    min_iteration_per_job=10,
    # Set the maximum number of iterations for an experiment to execute
    # (This is optional, unless using OptimizerBOHB where this is a must)
    max_iteration_per_job=30,
    
)  # done creating HyperParameterOptimizer

作为服务运行

要将优化作为服务运行，请将run_as_service参数设置为true。有关作为服务运行的更多信息，请参阅服务模式。

# if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization
if args['run_as_service']:
    # if this code is executed by `clearml-agent` the function call does nothing.
    # if executed locally, the local process will be terminated, and a remote copy will be executed instead
    task.execute_remotely(queue_name='services', exit_process=True)

优化

优化器已准备就绪。设置报告周期并开始，提供回调方法以报告最佳性能：

# report every 12 seconds, this is way too often, but we are testing here J
an_optimizer.set_report_period(0.2)
# start the optimization process, callback function to be called every time an experiment is completed
# this function returns immediately
an_optimizer.start(job_complete_callback=job_complete_callback)
# set the time limit for the optimization process (2 hours)

现在它正在运行：

设置优化时间限制
等待
获得最佳性能
打印最佳性能
停止优化器。

# set the time limit for the optimization process (2 hours)
an_optimizer.set_time_limit(in_minutes=90.0)
# wait until process is done (notice we are controlling the optimization process in the background)
an_optimizer.wait()
# optimization is completed, print the top performing experiments id
top_exp = an_optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])
# make sure background optimization stopped
an_optimizer.stop()

print('We are done, good bye')

设置优化搜索策略​

定义一个回调函数​

初始化优化任务​

设置参数​

创建优化器对象​

作为服务运行​

优化​