ray.rllib.算法.算法配置.AlgorithmConfig#

class ray.rllib.algorithms.algorithm_config.AlgorithmConfig(algo_class: type | None = None)[源代码]#

基类：_Config

RLlib 的 AlgorithmConfig 根据给定的配置构建一个 RLlib 算法。

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.callbacks import MemoryTrackingCallbacks
# Construct a generic config object, specifying values within different
# sub-categories, e.g. "training".
config = (PPOConfig().training(gamma=0.9, lr=0.01)
        .environment(env="CartPole-v1")
        .resources(num_gpus=0)
        .env_runners(num_env_runners=0)
        .callbacks(MemoryTrackingCallbacks)
    )
# A config object can be used to construct the respective Algorithm.
rllib_algo = config.build()

from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
# In combination with a tune.grid_search:
config = PPOConfig()
config.training(lr=tune.grid_search([0.01, 0.001]))
# Use `to_dict()` method to get the legacy plain python config dict
# for usage with `tune.Tuner().fit()`.
tune.Tuner("PPO", param_space=config.to_dict())

方法

`__init__`	初始化一个 AlgorithmConfig 实例。
`api_stack`	设置配置的API堆栈设置。
`build`	从此 AlgorithmConfig（或其副本）构建一个算法。
`build_learner`	基于 `self` 中的设置构建并返回一个新的 Learner 对象。
`build_learner_group`	基于 `self` 中的设置构建并返回一个新的 LearnerGroup 对象。
`callbacks`	设置回调配置。
`checkpointing`	设置配置的检查点设置。
`copy`	创建此配置的深层副本，并在必要时（解）冻结。
`debugging`	设置配置的调试设置。
`env_runners`	设置推出工作者的配置。
`environment`	设置配置的 RL-环境设置。
`evaluation`	设置配置的评估设置。
`experimental`	设置配置的实验性设置。
`fault_tolerance`	设置配置的容错设置。
`framework`	设置配置的深度学习框架设置。
`freeze`	冻结此配置对象，使得不再能设置任何属性。
`from_dict`	从旧版python配置字典创建一个AlgorithmConfig。
`get`	帮助伪装成字典的垫片方法。
`get_config_for_module`	返回一个特定于给定模块ID的AlgorithmConfig对象。
`get_default_learner_class`	返回用于此算法的学习器类。
`get_default_rl_module_spec`	返回用于此算法的 RLModule 规范。
`get_evaluation_config_object`	从 `self.evaluation_config` 创建一个完整的 AlgorithmConfig 对象。
`get_multi_agent_setup`	从 `self` 中的信息编译完整的多代理配置（字典）。
`get_multi_rl_module_spec`	返回基于给定环境/空间的 MultiRLModuleSpec。
`get_rl_module_spec`	根据给定的环境/空间返回 RLModuleSpec。
`get_rollout_fragment_length`	如果设置为“auto”，则会自动推断出适当的 rollout_fragment_length 设置。
`get_torch_compile_worker_config`	返回用于工作线程的 TorchCompileConfig。
`is_multi_agent`	返回此配置是否指定了一个多代理设置。
`items`	帮助伪装成字典的垫片方法。
`keys`	帮助伪装成字典的垫片方法。
`learners`	设置学习组和学习者工作相关配置。
`multi_agent`	设置配置的多代理设置。
`offline_data`	设置配置的离线数据设置。
`overrides`	生成并验证一组配置键/值对（通过 kwargs 传递）。
`pop`	帮助伪装成字典的垫片方法。
`python_environment`	设置配置的Python环境设置。
`reporting`	设置配置的报告设置。
`resources`	指定为算法及其 ray 角色/工作者分配的资源。
`rl_module`	设置配置的 RLModule 设置。
`serialize`	返回一个从字符串到可JSON化的值的映射，表示此配置。
`to_dict`	将所有设置转换为向后兼容的旧版配置字典。
`training`	设置与训练相关的配置。
`update_from_dict`	通过提供的 Python 配置字典修改此 AlgorithmConfig。
`validate`	验证此配置中的所有值。
`validate_train_batch_size_vs_rollout_fragment_length`	检测 `train_batch_size` 与 `rollout_fragment_length` 之间的不匹配。
`values`	帮助伪装成字典的垫片方法。

属性

`custom_resources_per_worker`
`delay_between_worker_restarts_s`
`evaluation_num_workers`
`ignore_worker_failures`
`is_atari`	如果指定的环境是 Atari 环境，则为真。
`learner_class`	返回此算法使用的学习者子类。
`max_num_worker_restarts`
`model_config`	定义使用的模型配置。
`num_consecutive_worker_failures_tolerance`
`num_cpus_for_local_worker`
`num_cpus_per_learner_worker`
`num_cpus_per_worker`
`num_envs_per_worker`
`num_gpus_per_learner_worker`
`num_gpus_per_worker`
`num_learner_workers`
`num_rollout_workers`
`recreate_failed_workers`
`rl_module_spec`
`total_train_batch_size`
`uses_new_env_runners`
`validate_workers_after_construction`
`worker_health_probe_timeout_s`
`worker_restore_timeout_s`