包装器

class gymnasium.vector.VectorWrapper(env: VectorEnv)[源代码]

将向量化环境包装起来,以允许模块化转换。

此类是所有向量化环境包装器的基类。子类可以重写某些方法,以在不触及原始代码的情况下改变原始向量化环境的行为。

备注

如果子类重写了 :meth:__init__,别忘了调用 super().__init__(env)

参数:

env – 要包装的环境

step(actions: ActType) tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]][源代码]

使用返回批量数据的行动遍历所有环境。

reset(*, seed: int | list[int] | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]][源代码]

使用种子和选项重置所有环境。

render() tuple[RenderFrame, ...] | None[源代码]

返回基础向量环境中的渲染模式。

close(**kwargs: Any)[源代码]

关闭所有环境。

class gymnasium.vector.VectorObservationWrapper(env: VectorEnv)[源代码]

将矢量化环境包装起来,以允许对观察结果进行模块化转换。

相当于向量化环境的 :class:gymnasium.ObservationWrapper

参数:

env – 要包装的环境

observations(observations: ObsType) ObsType[源代码]

定义向量观测的转换。

参数:

observations – 来自环境的向量观察

返回:

转换后的观察结果

class gymnasium.vector.VectorActionWrapper(env: VectorEnv)[源代码]

将矢量化环境包装起来,以允许对动作进行模块化变换。

向量化环境的 :class:gymnasium.ActionWrapper 等价物。

参数:

env – 要包装的环境

actions(actions: ActType) ActType[源代码]

在将动作发送到环境之前对其进行转换。

参数:

actions (ActType) – 转换的操作

返回:

ActType – 转换后的动作

class gymnasium.vector.VectorRewardWrapper(env: VectorEnv)[源代码]

将向量化环境封装起来,以允许对奖励进行模块化转换。

向量化环境的 :class:gymnasium.RewardWrapper 等效类。

参数:

env – 要包装的环境

rewards(rewards: ArrayType) ArrayType[源代码]

在返回奖励之前对其进行转换。

参数:

rewards (array) – 转变的奖励

返回:

数组 – 转换后的奖励

仅向量包装器

class gymnasium.wrappers.vector.DictInfoToList(env: VectorEnv)[源代码]

将向量化环境的infos从dict转换为List[dict]

这个包装器将向量环境的info格式从字典转换为字典列表。这个包装器旨在用于向量化环境周围。如果使用其他对info执行操作的包装器,如RecordEpisodeStatistics,这需要是最外层的包装器。

DictInfoToList(RecordEpisodeStatistics(vector_env))

示例

>>> import numpy as np
>>> dict_info = {
...      "k": np.array([0., 0., 0.5, 0.3]),
...      "_k": np.array([False, False, True, True])
...  }
...
>>> list_info = [{}, {}, {"k": 0.5}, {"k": 0.3}]
向量环境的示例:
>>> import numpy as np
>>> import gymnasium as gym
>>> from gymnasium.spaces import Dict, Box
>>> envs = gym.make_vec("CartPole-v1", num_envs=3)
>>> obs, info = envs.reset(seed=123)
>>> info
{}
>>> envs = DictInfoToList(envs)
>>> obs, info = envs.reset(seed=123)
>>> info
[{}, {}, {}]
向量环境的另一个示例:
>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("HalfCheetah-v4", num_envs=3)
>>> _ = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> _, _, _, _, infos = envs.step(envs.action_space.sample())
>>> infos
{'x_position': array([0.03332211, 0.10172355, 0.08920531]), '_x_position': array([ True,  True,  True]), 'x_velocity': array([-0.06296527,  0.89345848,  0.37710836]), '_x_velocity': array([ True,  True,  True]), 'reward_run': array([-0.06296527,  0.89345848,  0.37710836]), '_reward_run': array([ True,  True,  True]), 'reward_ctrl': array([-0.24503503, -0.21944423, -0.20672209]), '_reward_ctrl': array([ True,  True,  True])}
>>> envs = DictInfoToList(envs)
>>> _ = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> _, _, _, _, infos = envs.step(envs.action_space.sample())
>>> infos
[{'x_position': 0.03332210900362942, 'x_velocity': -0.06296527291998533, 'reward_run': -0.06296527291998533, 'reward_ctrl': -0.2450350284576416}, {'x_position': 0.10172354684460168, 'x_velocity': 0.8934584807363618, 'reward_run': 0.8934584807363618, 'reward_ctrl': -0.21944422721862794}, {'x_position': 0.08920531470057845, 'x_velocity': 0.3771083596080768, 'reward_run': 0.3771083596080768, 'reward_ctrl': -0.20672209262847902}]
变更日志:
  • v0.24.0 - 最初添加为 VectorListInfo

  • v1.0.0 - 重命名为 DictInfoToList

参数:

env (Env) – 应用包装器的环境

class gymnasium.wrappers.vector.VectorizeTransformObservation(env: VectorEnv, wrapper: type[TransformObservation], **kwargs: Any)[源代码]

为向量环境向量化单一代理转换观察包装器。

大多数单代理环境的 lambda 观察包装器都有矢量化实现,建议用户通过从 gymnasium.wrappers.vector... 导入来直接使用这些实现。以下示例说明了需要自定义 lambda 观察包装器的情况。

示例 - 正常观察:
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> envs.close()
>>> obs
array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282],
       [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598],
       [ 0.03517495, -0.000635  , -0.01098382, -0.03203924]],
      dtype=float32)
示例 - 应用一个自定义的 lambda 观察包装器,该包装器复制来自环境的观察结果
>>> import numpy as np
>>> import gymnasium as gym
>>> from gymnasium.spaces import Box
>>> from gymnasium.wrappers import TransformObservation
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> old_space = envs.single_observation_space
>>> new_space = Box(low=np.array([old_space.low, old_space.low]), high=np.array([old_space.high, old_space.high]))
>>> envs = VectorizeTransformObservation(envs, wrapper=TransformObservation, func=lambda x: np.array([x, x]), observation_space=new_space)
>>> obs, info = envs.reset(seed=123)
>>> envs.close()
>>> obs
array([[[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282],
        [ 0.01823519, -0.0446179 , -0.02796401, -0.03156282]],

       [[ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598],
        [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598]],

       [[ 0.03517495, -0.000635  , -0.01098382, -0.03203924],
        [ 0.03517495, -0.000635  , -0.01098382, -0.03203924]]],
      dtype=float32)
参数:
  • env – 要包装的向量环境。

  • wrapper – 向量化包装器

  • **kwargs – 包装器的键参数

class gymnasium.wrappers.vector.VectorizeTransformAction(env: VectorEnv, wrapper: type[TransformAction], **kwargs: Any)[源代码]

为向量环境向量化单一代理转换动作包装器。

示例 - 无动作转换:
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
>>> envs.close()
>>> obs
array([[-4.6343064e-01,  9.8971417e-05],
       [-4.4488689e-01, -1.9375233e-03],
       [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
示例 - 添加一个对动作应用ReLU的变换:
>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformAction
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = VectorizeTransformAction(envs, wrapper=TransformAction, func=lambda x: (x > 0.0) * x, action_space=envs.single_action_space)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
>>> envs.close()
>>> obs
array([[-4.6343064e-01,  9.8971417e-05],
       [-4.4354835e-01, -5.9898634e-04],
       [-4.3034542e-01, -6.9532328e-04]], dtype=float32)
参数:
  • env – 要包装的向量环境

  • wrapper – 向量化包装器

  • **kwargs – LambdaAction 包装器的参数

class gymnasium.wrappers.vector.VectorizeTransformReward(env: VectorEnv, wrapper: type[TransformReward], **kwargs: Any)[源代码]

向量化单一代理转换奖励包装器以用于向量环境。

一个应用ReLU到奖励的例子:
>>> import gymnasium as gym
>>> from gymnasium.wrappers import TransformReward
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = VectorizeTransformReward(envs, wrapper=TransformReward, func=lambda x: (x > 0.0) * x)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
>>> envs.close()
>>> rew
array([-0., -0., -0.])
参数:
  • env – 要包装的向量环境。

  • wrapper – 向量化包装器

  • **kwargs – 包装器的键参数

矢量化常用包装器

class gymnasium.wrappers.vector.RecordEpisodeStatistics(env: VectorEnv, buffer_length: int = 100, stats_key: str = 'episode')[源代码]

这个包装器将跟踪累积奖励和情节长度。

在向量化环境中的任何一集结束时,该集的统计数据将使用键 episode 添加到 info 中,而 _episode 键用于指示具有终止或截断集的环境索引。

>>> infos = {  
...     ...
...     "episode": {
...         "r": "<array of cumulative reward for each done sub-environment>",
...         "l": "<array of episode length for each done sub-environment>",
...         "t": "<array of elapsed time since beginning of episode for each done sub-environment>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

此外,最近的奖励和情节长度存储在可以通过 :attr:wrapped_env.return_queue 和 :attr:wrapped_env.length_queue 分别访问的缓冲区中。

变量:
  • return_queue – 过去 deque_size 个回合的累积奖励

  • length_queue – 最后 deque_size 个回合的长度

示例

>>> from pprint import pprint
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3)
>>> envs = RecordEpisodeStatistics(envs)
>>> obs, info = envs.reset(123)
>>> _ = envs.action_space.seed(123)
>>> end = False
>>> while not end:
...     obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
...     end = term.any() or trunc.any()
...
>>> envs.close()
>>> pprint(info) 
{'_episode': array([ True, False, False]),
 '_final_info': array([ True, False, False]),
 '_final_observation': array([ True, False, False]),
 'episode': {'l': array([11,  0,  0], dtype=int32),
             'r': array([11.,  0.,  0.], dtype=float32),
             't': array([0.007812, 0.      , 0.      ], dtype=float32)},
 'final_info': array([{}, None, None], dtype=object),
 'final_observation': array([array([ 0.11448676,  0.9416149 , -0.20946532, -1.7619033 ], dtype=float32),
       None, None], dtype=object)}
参数:
  • env (Env) – 应用包装器的环境

  • buffer_length – 缓冲区 :attr:return_queue、:attr:length_queue 和 :attr:time_queue 的大小

  • stats_key – 保存数据的 info 键

已实现的观察包装器

class gymnasium.wrappers.vector.TransformObservation(env: VectorEnv, func: Callable[[ObsType], Any], observation_space: Space | None = None)[源代码]

通过提供给包装器的函数转换观察结果。

此函数允许手动指定向量观察函数以及单个观察函数。当例如可以并行处理向量观察或通过其他更优化的方法时,这是可取的。否则,应使用 VectorizeTransformObservation,其中只需定义 single_func

示例 - 无观察变换:
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs
array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282],
       [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598],
       [ 0.03517495, -0.000635  , -0.01098382, -0.03203924]],
      dtype=float32)
  >>> envs.close()
示例 - 带有观察变换:
>>> import gymnasium as gym
>>> from gymnasium.spaces import Box
>>> def scale_and_shift(obs):
...     return (obs - 1.0) * 2.0
...
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> new_obs_space = Box(low=envs.observation_space.low, high=envs.observation_space.high)
>>> envs = TransformObservation(envs, func=scale_and_shift, observation_space=new_obs_space)
>>> obs, info = envs.reset(seed=123)
>>> obs
array([[-1.9635296, -2.0892358, -2.055928 , -2.0631256],
       [-1.9429494, -1.9428282, -1.9061728, -1.9503881],
       [-1.9296501, -2.00127  , -2.0219676, -2.0640786]], dtype=float32)
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • func – 一个将转换向量观测的函数。如果这个转换后的观测超出了 env.observation_space 的观测空间,则提供一个 observation_space

  • observation_space – 包装器的观察空间,如果为 None,则假定与 env.observation_space 相同。

class gymnasium.wrappers.vector.FilterObservation(env: VectorEnv, filter_keys: Sequence[str | int])[源代码]

用于过滤字典或元组观察空间的向量包装器。

示例 - 使用Dict空间创建矢量化环境,以演示如何过滤键:
>>> import numpy as np
>>> import gymnasium as gym
>>> from gymnasium.spaces import Dict, Box
>>> from gymnasium.wrappers import TransformObservation
>>> from gymnasium.wrappers.vector import VectorizeTransformObservation, FilterObservation
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> make_dict = lambda x: {"obs": x, "junk": np.array([0.0])}
>>> new_space = Dict({"obs": envs.single_observation_space, "junk": Box(low=-1.0, high=1.0)})
>>> envs = VectorizeTransformObservation(env=envs, wrapper=TransformObservation, func=make_dict, observation_space=new_space)
>>> envs = FilterObservation(envs, ["obs"])
>>> obs, info = envs.reset(seed=123)
>>> envs.close()
>>> obs
{'obs': array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282],
       [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598],
       [ 0.03517495, -0.000635  , -0.01098382, -0.03203924]],
      dtype=float32)}
参数:
  • env – 要包装的向量环境

  • filter_keys – 要包含的子空间,对 DictTuple 空间分别使用字符串或整数列表。

class gymnasium.wrappers.vector.FlattenObservation(env: VectorEnv)[源代码]

观察包装器,用于展平观察结果。

示例

>>> import gymnasium as gym
>>> envs = gym.make_vec("CarRacing-v2", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 96, 96, 3)
>>> envs = FlattenObservation(envs)
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 27648)
>>> envs.close()
参数:

env – 要包装的向量环境

class gymnasium.wrappers.vector.GrayscaleObservation(env: VectorEnv, keep_dim: bool = False)[源代码]

观察包装器,将RGB图像转换为灰度图像。

示例

>>> import gymnasium as gym
>>> envs = gym.make_vec("CarRacing-v2", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 96, 96, 3)
>>> envs = GrayscaleObservation(envs)
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 96, 96)
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • keep_dim – 如果在观察中保留通道,如果 True,则 obs.shape == 3,否则 obs.shape == 2

class gymnasium.wrappers.vector.ResizeObservation(env: VectorEnv, shape: tuple[int, ...])[源代码]

使用 OpenCV 调整图像观察的大小为指定形状。

示例

>>> import gymnasium as gym
>>> envs = gym.make_vec("CarRacing-v2", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 96, 96, 3)
>>> envs = ResizeObservation(envs, shape=(28, 28))
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 28, 28, 3)
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • shape – 调整大小后的观察形状

class gymnasium.wrappers.vector.ReshapeObservation(env: VectorEnv, shape: int | tuple[int, ...])[源代码]

将基于数组的观测值重塑为形状。

示例

>>> import gymnasium as gym
>>> envs = gym.make_vec("CarRacing-v2", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 96, 96, 3)
>>> envs = ReshapeObservation(envs, shape=(9216, 3))
>>> obs, info = envs.reset(seed=123)
>>> obs.shape
(3, 9216, 3)
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • shape – 重塑的观察空间

class gymnasium.wrappers.vector.RescaleObservation(env: VectorEnv, min_obs: floating | integer | ndarray, max_obs: floating | integer | ndarray)[源代码]

线性重缩放观测值到最小值和最大值之间。

示例

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.min()
-0.0446179
>>> obs.max()
0.0469136
>>> envs = RescaleObservation(envs, min_obs=-5.0, max_obs=5.0)
>>> obs, info = envs.reset(seed=123)
>>> obs.min()
-0.33379582
>>> obs.max()
0.55998987
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • min_obs – 新的最小观测界限

  • max_obs – 新的最大观测界限

class gymnasium.wrappers.vector.DtypeObservation(env: VectorEnv, dtype: Any)[源代码]

用于转换观察值数据类型的观察包装器。

示例

>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> obs.dtype
dtype('float32')
>>> envs = DtypeObservation(envs, dtype=np.float64)
>>> obs, info = envs.reset(seed=123)
>>> obs.dtype
dtype('float64')
>>> envs.close()
参数:
  • env – 要包装的向量环境

  • dtype – 观察的新数据类型

class gymnasium.wrappers.vector.NormalizeObservation(env: VectorEnv, epsilon: float = 1e-8)[源代码]

此包装器将标准化观测值,使得每个坐标均以单位方差为中心。

属性 _update_running_mean 允许冻结/继续观察统计量的运行均值计算。如果为 True(默认),RunningMeanStd 将在每次步骤和重置调用时更新。如果为 False,则使用计算的统计量但不进行更新;这可能在评估期间使用。

备注

归一化依赖于过去的轨迹和观察结果,如果包装器是新实例化的或策略最近被更改,归一化将不会正确进行。

没有归一化奖励包装器的示例:
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> obs, info = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> for _ in range(100):
...     obs, *_ = envs.step(envs.action_space.sample())
>>> np.mean(obs)
0.024251968
>>> np.std(obs)
0.62259156
>>> envs.close()
使用 normalize reward wrapper 的示例:
>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> envs = NormalizeObservation(envs)
>>> obs, info = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> for _ in range(100):
...     obs, *_ = envs.step(envs.action_space.sample())
>>> np.mean(obs)
-0.2359734
>>> np.std(obs)
1.1938739
>>> envs.close()
参数:
  • env (Env) – 应用包装器的环境

  • epsilon – 在缩放观测值时使用的稳定性参数。

已实现的 Action 包装器

class gymnasium.wrappers.vector.TransformAction(env: VectorEnv, func: Callable[[ActType], Any], action_space: Space | None = None)[源代码]

通过提供给包装器的函数转换一个动作。

函数 :attr:func 将应用于所有向量动作。如果来自 :attr:func 的观察结果超出了 env 的动作空间范围,请提供一个 :attr:action_space,它指定了向量化环境的动作空间。

示例 - 无动作转换:
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> for _ in range(10):
...     obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
...
>>> envs.close()
>>> obs
array([[-0.46553135, -0.00142543],
       [-0.498371  , -0.00715587],
       [-0.4651575 , -0.00624371]], dtype=float32)
示例 - 带有动作转换:
>>> import gymnasium as gym
>>> from gymnasium.spaces import Box
>>> def shrink_action(act):
...     return act * 0.3
...
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> new_action_space = Box(low=shrink_action(envs.action_space.low), high=shrink_action(envs.action_space.high))
>>> envs = TransformAction(env=envs, func=shrink_action, action_space=new_action_space)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> for _ in range(10):
...     obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
...
>>> envs.close()
>>> obs
array([[-0.48468155, -0.00372536],
       [-0.47599354, -0.00545912],
       [-0.46543318, -0.00615723]], dtype=float32)
参数:
  • env – 要包装的向量环境

  • func – 一个将转换动作的函数。如果这个转换后的动作超出了 env.action_space 的动作空间,则提供一个 action_space

  • action_space – 包装器的动作空间,如果为 None,则假定与 env.action_space 相同。

class gymnasium.wrappers.vector.ClipAction(env: VectorEnv)[源代码]

在有效的 :class:Box 观测空间边界内裁剪连续动作。

示例 - 将超出范围的动作传递给环境以进行裁剪。
>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = ClipAction(envs)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> obs, rew, term, trunc, info = envs.step(np.array([5.0, -5.0, 2.0]))
>>> envs.close()
>>> obs
array([[-0.4624777 ,  0.00105192],
       [-0.44504836, -0.00209899],
       [-0.42884544,  0.00080468]], dtype=float32)
参数:

env – 要包装的向量环境

class gymnasium.wrappers.vector.RescaleAction(env: VectorEnv, min_action: float | int | ndarray, max_action: float | int | ndarray)[源代码]

将环境的连续动作空间仿射重缩放到范围 [min_action, max_action]。

示例 - 无动作缩放:
>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> for _ in range(10):
...     obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1)))
...
>>> envs.close()
>>> obs
array([[-0.44799727,  0.00266526],
       [-0.4351738 ,  0.00133522],
       [-0.42683297,  0.00048403]], dtype=float32)
示例 - 使用动作缩放:
>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = RescaleAction(envs, 0.0, 1.0)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> for _ in range(10):
...     obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1)))
...
>>> envs.close()
>>> obs
array([[-0.48657528, -0.00395268],
       [-0.47377947, -0.00529102],
       [-0.46546045, -0.00614867]], dtype=float32)
参数:
  • env (Env) – 要包装的向量环境

  • min_action (float, int or np.ndarray) – 每个动作的最小值。这可能是一个 numpy 数组或一个标量。

  • max_action (float, int or np.ndarray) – 每个动作的最大值。这可能是一个numpy数组或一个标量。

已实现的奖励包装器

class gymnasium.wrappers.vector.TransformReward(env: VectorEnv, func: Callable[[ArrayType], ArrayType])[源代码]

一个奖励包装器,允许自定义函数修改步骤奖励。

奖励转换示例:
>>> import gymnasium as gym
>>> from gymnasium.spaces import Box
>>> def scale_and_shift(rew):
...     return (rew - 1.0) * 2.0
...
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = TransformReward(env=envs, func=scale_and_shift)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample())
>>> envs.close()
>>> obs
array([[-4.6343064e-01,  9.8971417e-05],
       [-4.4488689e-01, -1.9375233e-03],
       [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
参数:
  • env (Env) – 要包装的向量环境

  • func – (Callable): 应用于奖励的函数

class gymnasium.wrappers.vector.ClipReward(env: VectorEnv, min_reward: float | ndarray | None = None, max_reward: float | ndarray | None = None)[源代码]

一个将环境奖励限制在上下限之间的包装器。

带有裁剪奖励的示例:
>>> import numpy as np
>>> import gymnasium as gym
>>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
>>> envs = ClipReward(envs, 0.0, 2.0)
>>> _ = envs.action_space.seed(123)
>>> obs, info = envs.reset(seed=123)
>>> for _ in range(10):
...     obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1)))
...
>>> envs.close()
>>> rew
array([0., 0., 0.])
参数:
  • env – 要包装的向量环境

  • min_reward – 每一步的最小奖励

  • max_reward – 每一步的最大奖励

class gymnasium.wrappers.vector.NormalizeReward(env: VectorEnv, gamma: float = 0.99, epsilon: float = 1e-8)[源代码]

此包装器将标准化即时奖励,使得其指数移动平均值具有固定的方差。

指数移动平均的方差为 :math:(1 - \gamma)^2

属性 _update_running_mean 允许冻结/继续奖励统计的运行均值计算。如果为 True(默认),每次调用 self.normalize() 时,RunningMeanStd 都会更新。如果为 False,则使用计算的统计数据,但不再更新;这可能在评估期间使用。

备注

缩放取决于过去的轨迹,如果包装器是新实例化的或策略最近被更改,奖励将不会被正确缩放。

没有归一化奖励包装器的示例:
>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.make_vec("MountainCarContinuous-v0", 3)
>>> _ = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> episode_rewards = []
>>> for _ in range(100):
...     observation, reward, *_ = envs.step(envs.action_space.sample())
...     episode_rewards.append(reward)
...
>>> envs.close()
>>> np.mean(episode_rewards)
-0.03359492141887935
>>> np.std(episode_rewards)
0.029028230434438706
使用 normalize reward wrapper 的示例:
>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.make_vec("MountainCarContinuous-v0", 3)
>>> envs = NormalizeReward(envs)
>>> _ = envs.reset(seed=123)
>>> _ = envs.action_space.seed(123)
>>> episode_rewards = []
>>> for _ in range(100):
...     observation, reward, *_ = envs.step(envs.action_space.sample())
...     episode_rewards.append(reward)
...
>>> envs.close()
>>> np.mean(episode_rewards)
-0.1598639586606745
>>> np.std(episode_rewards)
0.27800309628058434
参数:
  • env (env) – 应用包装器的环境

  • epsilon (float) – 一个稳定性参数

  • gamma (float) – 用于指数移动平均中的折扣因子。

已实现的数据转换包装器