ray.rllib.evaluation.rollout_worker.RolloutWorker.sample_with_count#

RolloutWorker.sample_with_count() → Tuple[SampleBatch | MultiAgentBatch | Dict[str, Any], int][源代码]#

与 sample() 相同，但将计数作为单独的值返回。

返回:: 一批经验（例如，张量）及其收集的批次大小。

import gymnasium as gym
from ray.rllib.evaluation.rollout_worker import RolloutWorker
from ray.rllib.algorithms.ppo.ppo_tf_policy import PPOTF1Policy
worker = RolloutWorker(
  env_creator=lambda _: gym.make("CartPole-v1"),
  default_policy_class=PPOTFPolicy)
print(worker.sample_with_count())

(SampleBatch({"obs": [...], "action": [...], ...}), 3)