mapolicy#
Source code: tianshou/policy/multiagent/mapolicy.py
- class MultiAgentPolicyManager(*, policies: list[BasePolicy], env: PettingZooEnv, action_scaling: bool = False, action_bound_method: Optional[Literal['clip', 'tanh']] = 'clip', lr_scheduler: torch.optim.lr_scheduler.LRScheduler | MultipleLRSchedulers | None = None)[source]#
Multi-agent policy manager for MARL.
This multi-agent policy manager accepts a list of
BasePolicy. It dispatches the batch data to each of these policies when the “forward” is called. The same as “process_fn” and “learn”: it splits the data and feeds them to each policy. A figure in Multi-Agent Reinforcement Learning can help you better understand this procedure.- Parameters:
policies – a list of policies.
env – a PettingZooEnv.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – if not None, will be called in policy.update().
- exploration_noise(act: numpy.ndarray | BatchProtocol, batch: RolloutBatchProtocol) numpy.ndarray | BatchProtocol[source]#
Add exploration noise from sub-policy onto act.
- forward(batch: Batch, state: dict | Batch | None = None, **kwargs: Any) Batch[source]#
Dispatch batch data from obs.agent_id to every policy’s forward.
- Parameters:
state – if None, it means all agents have no state. If not None, it should contain keys of “agent_1”, “agent_2”, …
- Returns:
a Batch with the following contents:
{ "act": actions corresponding to the input "state": { "agent_1": output state of agent_1's policy for the state "agent_2": xxx ... "agent_n": xxx} "out": { "agent_1": output of agent_1's policy for the input "agent_2": xxx ... "agent_n": xxx} }
- learn(batch: RolloutBatchProtocol, *args: Any, **kwargs: Any) dict[str, float | list[float]][source]#
Dispatch the data to all policies for learning.
- Returns:
a dict with the following contents:
{ "agent_1/item1": item 1 of agent_1's policy.learn output "agent_1/item2": item 2 of agent_1's policy.learn output "agent_2/xxx": xxx ... "agent_n/xxx": xxx }
- process_fn(batch: RolloutBatchProtocol, buffer: ReplayBuffer, indice: ndarray) BatchProtocol[source]#
Dispatch batch data from obs.agent_id to every policy’s process_fn.
Save original multi-dimensional rew in “save_rew”, set rew to the reward of each agent during their “process_fn”, and restore the original reward afterwards.
- replace_policy(policy: BasePolicy, agent_id: int) None[source]#
Replace the “agent_id”th policy in this manager.