ray.rllib.execution.train_ops.train_one_step#
- ray.rllib.execution.train_ops.train_one_step(algorithm, train_batch, policies_to_train=None) Dict [源代码]#
在本地工作进程中改进
train_batch
中所有策略的函数。from ray.rllib.execution.rollout_ops import synchronous_parallel_sample algo = [...] train_batch = synchronous_parallel_sample(algo.env_runner_group) # This trains the policy on one batch. print(train_one_step(algo, train_batch)))
{"default_policy": ...}
更新
algorithm
对象的 NUM_ENV_STEPS_TRAINED 和 NUM_AGENT_STEPS_TRAINED 计数器以及 LEARN_ON_BATCH_TIMER 计时器。