Register for Ray Summit 2024 with keynotes from Mira Murati, Marc Andreessen, and Anastasis Germanidis.

ray.rllib.policy.Policy.从重放缓冲区学习批量数据#

Policy.learn_on_batch_from_replay_buffer(replay_actor: ActorHandle, policy_id: str) → Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][源代码]#

从给定的回放缓存中采样一批数据并执行更新。

参数:

replay_actor – 从重放缓冲区中采样的演员。
policy_id – 此策略的ID。

返回:

来自 compute_gradients() 的额外元数据的字典。