random#


class RandomPolicy(*, action_space: Space, observation_space: gymnasium.spaces.space.Space | None = None, action_scaling: bool = False, action_bound_method: Optional[Literal['clip', 'tanh']] = 'clip', lr_scheduler: torch.optim.lr_scheduler.LRScheduler | MultipleLRSchedulers | None = None)[source]#

A random agent used in multi-agent learning.

It randomly chooses an action from the legal action.

forward(batch: ObsBatchProtocol, state: dict | BatchProtocol | numpy.ndarray | None = None, **kwargs: Any) ActBatchProtocol[source]#

Compute the random action over the given batch data.

The input should contain a mask in batch.obs, with “True” to be available and “False” to be unavailable. For example, batch.obs.mask == np.array([[False, True, False]]) means with batch size 1, action “1” is available but action “0” and “2” are unavailable.

Returns:

A Batch with “act” key, containing the random action.

See also

Please refer to forward() for more detailed explanation.

learn(batch: RolloutBatchProtocol, *args: Any, **kwargs: Any) dict[str, float][source]#

Since a random agent learns nothing, it returns an empty dict.