使用 Comet 和 Tune#
Comet 是一个管理和优化整个机器学习生命周期的工具,包括实验跟踪、模型优化、数据集版本管理和模型生产监控。
示例#
为了说明如何将你的试验结果记录到 Comet,我们将定义一个简单的训练函数,模拟一个 loss
指标:
import numpy as np
from ray import train, tune
def train_function(config):
for i in range(30):
loss = config["mean"] + config["sd"] * np.random.randn()
train.report({"loss": loss})
现在,给定您提供您的Comet API密钥和您的项目名称,如下所示:
api_key = "YOUR_COMET_API_KEY"
project_name = "YOUR_COMET_PROJECT_NAME"
您可以通过相应地在您的 RunConfig()
中指定 callbacks
参数来添加 Comet 日志记录器:
from ray.air.integrations.comet import CometLoggerCallback
tuner = tune.Tuner(
train_function,
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
),
run_config=train.RunConfig(
callbacks=[
CometLoggerCallback(
api_key=api_key, project_name=project_name, tags=["comet_example"]
)
],
),
param_space={"mean": tune.grid_search([1, 2, 3]), "sd": tune.uniform(0.2, 0.8)},
)
results = tuner.fit()
print(results.get_best_result().config)
2022-07-22 15:41:21,477 INFO services.py:1483 -- View the Ray dashboard at http://127.0.0.1:8267
/Users/kai/coding/ray/python/ray/tune/trainable/function_trainable.py:643: DeprecationWarning: `checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:
from ray.air import session
def train(config):
# ...
session.report({"metric": metric}, checkpoint=checkpoint)
For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session
DeprecationWarning,
Current time: 2022-07-22 15:41:31 (running for 00:00:06.73)
Memory usage on this node: 9.9/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.5 GiB heap, 0.0/2.0 GiB objects
Current best trial: 5bf98_00000 with loss=1.0234101880766688 and parameters={'mean': 1, 'sd': 0.40575843135279466}
Result logdir: /Users/kai/ray_results/train_function_2022-07-22_15-41-18
Number of trials: 3/3 (3 TERMINATED)
Trial name | status | loc | mean | sd | iter | total time (s) | loss |
---|---|---|---|---|---|---|---|
train_function_5bf98_00000 | TERMINATED | 127.0.0.1:48140 | 1 | 0.405758 | 30 | 2.11758 | 1.02341 |
train_function_5bf98_00001 | TERMINATED | 127.0.0.1:48147 | 2 | 0.647335 | 30 | 0.0770731 | 1.53993 |
train_function_5bf98_00002 | TERMINATED | 127.0.0.1:48151 | 3 | 0.256568 | 30 | 0.0728431 | 3.0393 |
2022-07-22 15:41:24,693 INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged
For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/
Result for train_function_5bf98_00000:
date: 2022-07-22_15-41-27
done: false
experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
loss: 1.1009860426725162
node_ip: 127.0.0.1
pid: 48140
time_since_restore: 0.000125885009765625
time_this_iter_s: 0.000125885009765625
time_total_s: 0.000125885009765625
timestamp: 1658500887
timesteps_since_restore: 0
training_iteration: 1
trial_id: 5bf98_00000
warmup_time: 0.0029532909393310547
Result for train_function_5bf98_00000:
date: 2022-07-22_15-41-29
done: true
experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850
experiment_tag: 0_mean=1,sd=0.4058
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 30
loss: 1.0234101880766688
node_ip: 127.0.0.1
pid: 48140
time_since_restore: 2.1175789833068848
time_this_iter_s: 0.0022211074829101562
time_total_s: 2.1175789833068848
timestamp: 1658500889
timesteps_since_restore: 0
training_iteration: 30
trial_id: 5bf98_00000
warmup_time: 0.0029532909393310547
Result for train_function_5bf98_00001:
date: 2022-07-22_15-41-30
done: false
experiment_id: ba865bc613d94413a37fe027123ba031
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
loss: 2.3754716847171182
node_ip: 127.0.0.1
pid: 48147
time_since_restore: 0.0001590251922607422
time_this_iter_s: 0.0001590251922607422
time_total_s: 0.0001590251922607422
timestamp: 1658500890
timesteps_since_restore: 0
training_iteration: 1
trial_id: 5bf98_00001
warmup_time: 0.0036537647247314453
Result for train_function_5bf98_00001:
date: 2022-07-22_15-41-30
done: true
experiment_id: ba865bc613d94413a37fe027123ba031
experiment_tag: 1_mean=2,sd=0.6473
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 30
loss: 1.5399275480220707
node_ip: 127.0.0.1
pid: 48147
time_since_restore: 0.0770730972290039
time_this_iter_s: 0.002664804458618164
time_total_s: 0.0770730972290039
timestamp: 1658500890
timesteps_since_restore: 0
training_iteration: 30
trial_id: 5bf98_00001
warmup_time: 0.0036537647247314453
Result for train_function_5bf98_00002:
date: 2022-07-22_15-41-31
done: false
experiment_id: 2efb6f3c4d954bcab1ea4083f138008e
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
loss: 3.204653294422825
node_ip: 127.0.0.1
pid: 48151
time_since_restore: 0.00014400482177734375
time_this_iter_s: 0.00014400482177734375
time_total_s: 0.00014400482177734375
timestamp: 1658500891
timesteps_since_restore: 0
training_iteration: 1
trial_id: 5bf98_00002
warmup_time: 0.0030150413513183594
Result for train_function_5bf98_00002:
date: 2022-07-22_15-41-31
done: true
experiment_id: 2efb6f3c4d954bcab1ea4083f138008e
experiment_tag: 2_mean=3,sd=0.2566
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 30
loss: 3.0393011150182865
node_ip: 127.0.0.1
pid: 48151
time_since_restore: 0.07284307479858398
time_this_iter_s: 0.0020139217376708984
time_total_s: 0.07284307479858398
timestamp: 1658500891
timesteps_since_restore: 0
training_iteration: 30
trial_id: 5bf98_00002
warmup_time: 0.0030150413513183594
2022-07-22 15:41:31,290 INFO tune.py:738 -- Total run time: 7.36 seconds (6.72 seconds for the tuning loop).
{'mean': 1, 'sd': 0.40575843135279466}
调整 Comet 日志记录器#
Ray Tune通过 CometLoggerCallback
提供了与 Comet 的集成,该集成自动将报告给 Tune 的指标和参数记录到 Comet 用户界面中。
点击以下下拉菜单以详细查看该回调 API:
- class ray.air.integrations.comet.CometLoggerCallback(online: bool = True, tags: List[str] = None, save_checkpoints: bool = False, **experiment_kwargs)[源代码]
CometLoggerCallback for logging Tune results to Comet.
Comet (https://comet.ml/site/) is a tool to manage and optimize the entire ML lifecycle, from experiment tracking, model optimization and dataset versioning to model production monitoring.
This Ray Tune
LoggerCallback
sends metrics and parameters to Comet for tracking.In order to use the CometLoggerCallback you must first install Comet via
pip install comet_ml
Then set the following environment variables
export COMET_API_KEY=<Your API Key>
Alternatively, you can also pass in your API Key as an argument to the CometLoggerCallback constructor.
CometLoggerCallback(api_key=<Your API Key>)
- 参数:
online – Whether to make use of an Online or Offline Experiment. Defaults to True.
tags – Tags to add to the logged Experiment. Defaults to None.
save_checkpoints – If
True
, model checkpoints will be saved to Comet ML as artifacts. Defaults toFalse
.**experiment_kwargs – Other keyword arguments will be passed to the constructor for comet_ml.Experiment (or OfflineExperiment if online=False).
Please consult the Comet ML documentation for more information on the Experiment and OfflineExperiment classes: https://comet.ml/site/
Example:
from ray.air.integrations.comet import CometLoggerCallback tune.run( train, config=config callbacks=[CometLoggerCallback( True, ['tag1', 'tag2'], workspace='my_workspace', project_name='my_project_name' )] )