使用 Comet 和 Tune#

Comet 是一个管理和优化整个机器学习生命周期的工具,包括实验跟踪、模型优化、数据集版本管理和模型生产监控。

Comet

示例#

为了说明如何将你的试验结果记录到 Comet,我们将定义一个简单的训练函数,模拟一个 loss 指标:

import numpy as np
from ray import train, tune


def train_function(config):
    for i in range(30):
        loss = config["mean"] + config["sd"] * np.random.randn()
        train.report({"loss": loss})

现在,给定您提供您的Comet API密钥和您的项目名称,如下所示:

api_key = "YOUR_COMET_API_KEY"
project_name = "YOUR_COMET_PROJECT_NAME"

您可以通过相应地在您的 RunConfig() 中指定 callbacks 参数来添加 Comet 日志记录器:

from ray.air.integrations.comet import CometLoggerCallback

tuner = tune.Tuner(
    train_function,
    tune_config=tune.TuneConfig(
        metric="loss",
        mode="min",
    ),
    run_config=train.RunConfig(
        callbacks=[
            CometLoggerCallback(
                api_key=api_key, project_name=project_name, tags=["comet_example"]
            )
        ],
    ),
    param_space={"mean": tune.grid_search([1, 2, 3]), "sd": tune.uniform(0.2, 0.8)},
)
results = tuner.fit()

print(results.get_best_result().config)
2022-07-22 15:41:21,477	INFO services.py:1483 -- View the Ray dashboard at http://127.0.0.1:8267
/Users/kai/coding/ray/python/ray/tune/trainable/function_trainable.py:643: DeprecationWarning: `checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:

from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session

  DeprecationWarning,
== Status ==
Current time: 2022-07-22 15:41:31 (running for 00:00:06.73)
Memory usage on this node: 9.9/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.5 GiB heap, 0.0/2.0 GiB objects
Current best trial: 5bf98_00000 with loss=1.0234101880766688 and parameters={'mean': 1, 'sd': 0.40575843135279466}
Result logdir: /Users/kai/ray_results/train_function_2022-07-22_15-41-18
Number of trials: 3/3 (3 TERMINATED)
Trial name status loc mean sd iter total time (s) loss
train_function_5bf98_00000TERMINATED127.0.0.1:48140 10.405758 30 2.11758 1.02341
train_function_5bf98_00001TERMINATED127.0.0.1:48147 20.647335 30 0.07707311.53993
train_function_5bf98_00002TERMINATED127.0.0.1:48151 30.256568 30 0.07284313.0393


2022-07-22 15:41:24,693	INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged 
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged 
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged 
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
Result for train_function_5bf98_00000:
  date: 2022-07-22_15-41-27
  done: false
  experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 1.1009860426725162
  node_ip: 127.0.0.1
  pid: 48140
  time_since_restore: 0.000125885009765625
  time_this_iter_s: 0.000125885009765625
  time_total_s: 0.000125885009765625
  timestamp: 1658500887
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 5bf98_00000
  warmup_time: 0.0029532909393310547
  
Result for train_function_5bf98_00000:
  date: 2022-07-22_15-41-29
  done: true
  experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850
  experiment_tag: 0_mean=1,sd=0.4058
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 30
  loss: 1.0234101880766688
  node_ip: 127.0.0.1
  pid: 48140
  time_since_restore: 2.1175789833068848
  time_this_iter_s: 0.0022211074829101562
  time_total_s: 2.1175789833068848
  timestamp: 1658500889
  timesteps_since_restore: 0
  training_iteration: 30
  trial_id: 5bf98_00000
  warmup_time: 0.0029532909393310547
  
Result for train_function_5bf98_00001:
  date: 2022-07-22_15-41-30
  done: false
  experiment_id: ba865bc613d94413a37fe027123ba031
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 2.3754716847171182
  node_ip: 127.0.0.1
  pid: 48147
  time_since_restore: 0.0001590251922607422
  time_this_iter_s: 0.0001590251922607422
  time_total_s: 0.0001590251922607422
  timestamp: 1658500890
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 5bf98_00001
  warmup_time: 0.0036537647247314453
  
Result for train_function_5bf98_00001:
  date: 2022-07-22_15-41-30
  done: true
  experiment_id: ba865bc613d94413a37fe027123ba031
  experiment_tag: 1_mean=2,sd=0.6473
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 30
  loss: 1.5399275480220707
  node_ip: 127.0.0.1
  pid: 48147
  time_since_restore: 0.0770730972290039
  time_this_iter_s: 0.002664804458618164
  time_total_s: 0.0770730972290039
  timestamp: 1658500890
  timesteps_since_restore: 0
  training_iteration: 30
  trial_id: 5bf98_00001
  warmup_time: 0.0036537647247314453
  
Result for train_function_5bf98_00002:
  date: 2022-07-22_15-41-31
  done: false
  experiment_id: 2efb6f3c4d954bcab1ea4083f138008e
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 3.204653294422825
  node_ip: 127.0.0.1
  pid: 48151
  time_since_restore: 0.00014400482177734375
  time_this_iter_s: 0.00014400482177734375
  time_total_s: 0.00014400482177734375
  timestamp: 1658500891
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 5bf98_00002
  warmup_time: 0.0030150413513183594
  
Result for train_function_5bf98_00002:
  date: 2022-07-22_15-41-31
  done: true
  experiment_id: 2efb6f3c4d954bcab1ea4083f138008e
  experiment_tag: 2_mean=3,sd=0.2566
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 30
  loss: 3.0393011150182865
  node_ip: 127.0.0.1
  pid: 48151
  time_since_restore: 0.07284307479858398
  time_this_iter_s: 0.0020139217376708984
  time_total_s: 0.07284307479858398
  timestamp: 1658500891
  timesteps_since_restore: 0
  training_iteration: 30
  trial_id: 5bf98_00002
  warmup_time: 0.0030150413513183594
  
2022-07-22 15:41:31,290	INFO tune.py:738 -- Total run time: 7.36 seconds (6.72 seconds for the tuning loop).
{'mean': 1, 'sd': 0.40575843135279466}

调整 Comet 日志记录器#

Ray Tune通过 CometLoggerCallback 提供了与 Comet 的集成,该集成自动将报告给 Tune 的指标和参数记录到 Comet 用户界面中。

点击以下下拉菜单以详细查看该回调 API:

class ray.air.integrations.comet.CometLoggerCallback(online: bool = True, tags: List[str] = None, save_checkpoints: bool = False, **experiment_kwargs)[源代码]

CometLoggerCallback for logging Tune results to Comet.

Comet (https://comet.ml/site/) is a tool to manage and optimize the entire ML lifecycle, from experiment tracking, model optimization and dataset versioning to model production monitoring.

This Ray Tune LoggerCallback sends metrics and parameters to Comet for tracking.

In order to use the CometLoggerCallback you must first install Comet via pip install comet_ml

Then set the following environment variables export COMET_API_KEY=<Your API Key>

Alternatively, you can also pass in your API Key as an argument to the CometLoggerCallback constructor.

CometLoggerCallback(api_key=<Your API Key>)

参数:
  • online – Whether to make use of an Online or Offline Experiment. Defaults to True.

  • tags – Tags to add to the logged Experiment. Defaults to None.

  • save_checkpoints – If True, model checkpoints will be saved to Comet ML as artifacts. Defaults to False.

  • **experiment_kwargs – Other keyword arguments will be passed to the constructor for comet_ml.Experiment (or OfflineExperiment if online=False).

Please consult the Comet ML documentation for more information on the Experiment and OfflineExperiment classes: https://comet.ml/site/

Example:

from ray.air.integrations.comet import CometLoggerCallback
tune.run(
    train,
    config=config
    callbacks=[CometLoggerCallback(
        True,
        ['tag1', 'tag2'],
        workspace='my_workspace',
        project_name='my_project_name'
        )]
)