collector#
Source code: tianshou/data/collector.py
- class AsyncCollector(policy: BasePolicy, env: BaseVectorEnv, buffer: ReplayBuffer | None = None, preprocess_fn: collections.abc.Callable[[...], RolloutBatchProtocol] | None = None, exploration_noise: bool = False)[source]#
Async Collector handles async vector environment.
The arguments are exactly the same as
Collector, please refer toCollectorfor more detailed explanation.- collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) dict[str, Any][source]#
Collect a specified number of step or episode with async env setting.
This function doesn’t collect exactly n_step or n_episode number of transitions. Instead, in order to support async setting, it may collect more than given n_step or n_episode transitions and save into buffer.
- Parameters:
n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy for collecting data. Default to False.
render – the sleep time between rendering consecutive frames. Default to None (no rendering).
no_grad – whether to retain gradient in policy.forward(). Default to True (no gradient retaining).
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)
Note
One and only one collection number specification is permitted, either
n_steporn_episode.- Returns:
A dict including the following keys
n/epcollected number of episodes.n/stcollected number of steps.rewsarray of episode reward over collected episodes.lensarray of episode length over collected episodes.idxsarray of episode start index in buffer over collected episodes.rewmean of episodic rewards.lenmean of episodic lengths.rew_stdstandard error of episodic rewards.len_stdstandard error of episodic lengths.
- class Collector(policy: BasePolicy, env: gymnasium.core.Env | BaseVectorEnv, buffer: ReplayBuffer | None = None, preprocess_fn: collections.abc.Callable[[...], RolloutBatchProtocol] | None = None, exploration_noise: bool = False)[source]#
Collector enables the policy to interact with different types of envs with exact number of steps or episodes.
- Parameters:
policy – an instance of the
BasePolicyclass.env – a
gym.Envenvironment or an instance of theBaseVectorEnvclass.buffer – an instance of the
ReplayBufferclass. If set to None, it will not store the data. Default to None.preprocess_fn (function) – a function called before the data has been added to the buffer, see issue #42 and Handle Batched Data Stream in Collector. Default to None.
exploration_noise – determine whether the action needs to be modified with corresponding policy’s exploration noise. If so, “policy. exploration_noise(act, batch)” will be called automatically to add the exploration noise into action. Default to False.
The “preprocess_fn” is a function called before the data has been added to the buffer with batch format. It will receive only “obs” and “env_id” when the collector resets the environment, and will receive the keys “obs_next”, “rew”, “terminated”, “truncated, “info”, “policy” and “env_id” in a normal env step. Alternatively, it may also accept the keys “obs_next”, “rew”, “done”, “info”, “policy” and “env_id”. It returns either a dict or a
Batchwith the modified keys and values. Examples are in “test/base/test_collector.py”.Note
Please make sure the given environment has a time limitation if using n_episode collect option.
Note
In past versions of Tianshou, the replay buffer that was passed to __init__ was automatically reset. This is not done in the current implementation.
- collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) dict[str, Any][source]#
Collect a specified number of step or episode.
To ensure unbiased sampling result with n_episode option, this function will first collect
n_episode - env_numepisodes, then for the lastenv_numepisodes, they will be collected evenly from each env.- Parameters:
n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy for collecting data. Default to False.
render – the sleep time between rendering consecutive frames. Default to None (no rendering).
no_grad – whether to retain gradient in policy.forward(). Default to True (no gradient retaining).
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)
Note
One and only one collection number specification is permitted, either
n_steporn_episode.- Returns:
A dict including the following keys
n/epcollected number of episodes.n/stcollected number of steps.rewsarray of episode reward over collected episodes.lensarray of episode length over collected episodes.idxsarray of episode start index in buffer over collected episodes.rewmean of episodic rewards.lenmean of episodic lengths.rew_stdstandard error of episodic rewards.len_stdstandard error of episodic lengths.
- reset(reset_buffer: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) None[source]#
Reset the environment, statistics, current data and possibly replay memory.
- Parameters:
reset_buffer – if true, reset the replay buffer that is attached to the collector.
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)