Experiments¶

Training and evaluation¶

chainerrl.experiments.train_agent_async(outdir, processes, make_env, profile=False, steps=80000000, eval_interval=1000000, eval_n_runs=10, max_episode_len=None, step_offset=0, successful_score=None, eval_explorer=None, agent=None, make_agent=None, global_step_hooks=[], logger=None)[source]¶

Train agent asynchronously using multiprocessing.

Either agent or make_agent must be specified.

Parameters:

outdir (str) – Path to the directory to output things.
processes (int) – Number of processes.
make_env (callable) – (process_idx, test) -> Environment.
profile (bool) – Profile if set True.
steps (int) – Number of global time steps for training.
eval_interval (int) – Interval of evaluation. If set to None, the agent will not be evaluated at all.
eval_n_runs (int) – Number of runs for each time of evaluation.
max_episode_len (int) – Maximum episode length.
step_offset (int) – Time step from which training starts.
successful_score (float) – Finish training if the mean score is greater or equal to this value if not None
eval_explorer – Explorer used for evaluation.
agent (Agent) – Agent to train.
make_agent (callable) – (process_idx) -> Agent
global_step_hooks (list) – List of callable objects that accepts (env, agent, step) as arguments. They are called every global step. See chainerrl.experiments.hooks.
logger (logging.Logger) – Logger used in this function.

Returns:

Trained agent.

chainerrl.experiments.train_agent_with_evaluation(agent, env, steps, eval_n_runs, eval_interval, outdir, max_episode_len=None, step_offset=0, eval_explorer=None, eval_max_episode_len=None, eval_env=None, successful_score=None, step_hooks=[], logger=None)[source]¶

Train an agent while regularly evaluating it.

Parameters:

agent – Agent to train.
env – Environment train the againt against.
steps (int) – Number of total time steps for training.
eval_n_runs (int) – Number of runs for each time of evaluation.
eval_interval (int) – Interval of evaluation.
outdir (str) – Path to the directory to output things.
max_episode_len (int) – Maximum episode length.
step_offset (int) – Time step from which training starts.
eval_explorer – Explorer used for evaluation.
eval_env – Environment used for evaluation.
successful_score (float) – Finish training if the mean score is greater or equal to this value if not None
step_hooks (list) – List of callable objects that accepts (env, agent, step) as arguments. They are called every step. See chainerrl.experiments.hooks.
logger (logging.Logger) – Logger used in this function.

Training hooks¶

class chainerrl.experiments.StepHook[source]¶

Hook function that will be called in training.

This class is for clarifying the interface required for Hook functions. You don’t need to inherit this class to define your own hooks. Any callable that accepts (env, agent, step) as arguments can be used as a hook.

class chainerrl.experiments.LinearInterpolationHook(total_steps, start_value, stop_value, setter)[source]¶

Hook that will set a linearly interpolated value.

You can use this hook to decay the learning rate by using a setter function as follows:

def lr_setter(env, agent, value):
    agent.optimizer.lr = value

hook = LinearInterpolationHook(10 ** 6, 1e-3, 0, lr_setter)

Parameters:	total_steps (int) – Number of total steps. start_value (float) – Start value. stop_value (float) – Stop value. setter (callable) – (env, agent, value) -> None