Register for Ray Summit 2024 with keynotes from Mira Murati, Marc Andreessen, and Anastasis Germanidis.

ray.rllib.policy.torch_policy_v2.TorchPolicyV2.额外动作输出#

TorchPolicyV2.extra_action_out(input_dict: Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], state_batches: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], model: TorchModelV2, action_dist: TorchDistributionWrapper) → Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][源代码]#

返回包含在经验批次中的额外信息的字典。

参数:

input_dict – 模型输入张量的字典。
state_batches – 状态张量的列表。
model – 对模型对象的引用。
action_dist – Torch 动作分布对象以获取对数概率（例如，对于已经采样的动作）。

返回:

在 compute_actions_from_input_dict() 调用中返回的额外输出（第三个返回值）。