ray.rllib.execution.train_ops.train_one_step#

ray.rllib.execution.train_ops.train_one_step(algorithm, train_batch, policies_to_train=None) Dict[源代码]#

在本地工作进程中改进 train_batch 中所有策略的函数。

from ray.rllib.execution.rollout_ops import synchronous_parallel_sample
algo = [...]
train_batch = synchronous_parallel_sample(algo.env_runner_group)
# This trains the policy on one batch.
print(train_one_step(algo, train_batch)))
{"default_policy": ...}

更新 algorithm 对象的 NUM_ENV_STEPS_TRAINED 和 NUM_AGENT_STEPS_TRAINED 计数器以及 LEARN_ON_BATCH_TIMER 计时器。