Register for Ray Summit 2024 with keynotes from Mira Murati, Marc Andreessen, and Anastasis Germanidis.

ray.rllib.policy.torch_policy_v2.TorchPolicyV2.action_sampler_fn#

TorchPolicyV2.action_sampler_fn(model: ModelV2, *, obs_batch: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, state_batches: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, **kwargs) → Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]][源代码]#

给定策略，用于采样新动作的自定义函数。

参数:

model – 底层模型。
obs_batch – 观察张量批次。
state_batches – 动作采样状态批次。

返回:

采样动作对数似然动作分布输入更新状态