Explorers¶

Explorer interfaces¶

class chainerrl.explorer.Explorer[source]¶

Abstract explorer.

select_action(t, greedy_action_func, action_value=None)[source]¶

Select an action.

Parameters:	t – current time step greedy_action_func – function with no argument that returns an action action_value (ActionValue) – ActionValue object

Explorer implementations¶

class chainerrl.explorers.AdditiveGaussian(scale, low=None, high=None)[source]¶

Additive Gaussian noise to actions.

Each action must be numpy.ndarray.

Parameters:

scale (float or array_like of floats) – Scale parameter.
low (float, array_like of floats, or None) – Lower bound of action space used to clip an action after adding a noise. If set to None, clipping is not performed on lower edge.
high (float, array_like of floats, or None) – Higher bound of action space used to clip an action after adding a noise. If set to None, clipping is not performed on upper edge.

class chainerrl.explorers.AdditiveOU(mu=0.0, theta=0.15, sigma=0.3, start_with_mu=False, logger=<Logger chainerrl.explorers.additive_ou (WARNING)>)[source]¶

Additive Ornstein-Uhlenbeck process.

Used in https://arxiv.org/abs/1509.02971 for exploration.

Parameters:	mu (float) – Mean of the OU process theta (float) – Friction to pull towards the mean sigma (float or ndarray) – Scale of noise start_with_mu (bool) – Start the process without noise

class chainerrl.explorers.Boltzmann(T=1.0)[source]¶

Boltzmann exploration.

Parameters:	T (float) – Temperature of Boltzmann distribution.

class chainerrl.explorers.ConstantEpsilonGreedy(epsilon, random_action_func, logger=<Logger chainerrl.explorers.epsilon_greedy (WARNING)>)[source]¶

Epsilon-greedy with constant epsilon.

Parameters:	epsilon – epsilon used random_action_func – function with no argument that returns action logger – logger used

class chainerrl.explorers.LinearDecayEpsilonGreedy(start_epsilon, end_epsilon, decay_steps, random_action_func, logger=<Logger chainerrl.explorers.epsilon_greedy (WARNING)>)[source]¶

Epsilon-greedy with linearly decayed epsilon

Parameters:	start_epsilon – max value of epsilon end_epsilon – min value of epsilon decay_steps – how many steps it takes for epsilon to decay random_action_func – function with no argument that returns action logger – logger used

class chainerrl.explorers.Greedy[source]¶: No exploration