Register for Ray Summit 2024 with keynotes from Mira Murati, Marc Andreessen, and Anastasis Germanidis.

ray.rllib.policy.eager_tf_policy_v2.EagerTFPolicyV2.postprocess_trajectory#

EagerTFPolicyV2.postprocess_trajectory(sample_batch: SampleBatch, other_agent_batches: SampleBatch | None = None, episode: Episode | None = None)[源代码]#

以 SampleBatch 格式进行轨迹后处理。

参数:

sample_batch – sample_batch: 策略的经验批次，其中最多包含一个剧情节轨迹。
other_agent_batches – 在多智能体环境中，这包含一个从智能体ID到（策略，智能体批次）元组的映射，其中包含其他智能体的策略和经验。
episode – 一个可选的多智能体剧集对象，用于提供访问所有内部剧集状态的权限，这对于基于模型的或多智能体算法可能很有用。

返回:

后处理的样本批次。