utils#


gather_info(start_time: float, train_collector: Collector | None, test_collector: Collector | None, best_reward: float, best_reward_std: float) dict[str, float | str][source]#

A simple wrapper of gathering information from collectors.

Returns:

A dictionary with the following keys:

  • train_step the total collected step of training collector;

  • train_episode the total collected episode of training collector;

  • train_time/collector the time for collecting transitions in the training collector;

  • train_time/model the time for training models;

  • train_speed the speed of training (env_step per second);

  • test_step the total collected step of test collector;

  • test_episode the total collected episode of test collector;

  • test_time the time for testing;

  • test_speed the speed of testing (env_step per second);

  • best_reward the best reward over the test results;

  • duration the total elapsed time.

test_episode(policy: BasePolicy, collector: Collector, test_fn: collections.abc.Callable[[int, int | None], None] | None, epoch: int, n_episode: int, logger: BaseLogger | None = None, global_step: int | None = None, reward_metric: collections.abc.Callable[[numpy.ndarray], numpy.ndarray] | None = None) dict[str, Any][source]#

A simple wrapper of testing policy in collector.