Explorers

Explorer interfaces

class chainerrl.explorer.Explorer[source]

Abstract explorer.

select_action(t, greedy_action_func, action_value=None)[source]

Select an action.

Parameters:
  • t – current time step
  • greedy_action_func – function with no argument that returns an action
  • action_value (ActionValue) – ActionValue object

Explorer implementations

class chainerrl.explorers.AdditiveGaussian(scale, low=None, high=None)[source]

Additive Gaussian noise to actions.

Each action must be numpy.ndarray.

Parameters:
  • scale (float or array_like of floats) – Scale parameter.
  • low (float, array_like of floats, or None) – Lower bound of action space used to clip an action after adding a noise. If set to None, clipping is not performed on lower edge.
  • high (float, array_like of floats, or None) – Higher bound of action space used to clip an action after adding a noise. If set to None, clipping is not performed on upper edge.
class chainerrl.explorers.AdditiveOU(mu=0.0, theta=0.15, sigma=0.3, start_with_mu=False, logger=<Logger chainerrl.explorers.additive_ou (WARNING)>)[source]

Additive Ornstein-Uhlenbeck process.

Used in https://arxiv.org/abs/1509.02971 for exploration.

Parameters:
  • mu (float) – Mean of the OU process
  • theta (float) – Friction to pull towards the mean
  • sigma (float or ndarray) – Scale of noise
  • start_with_mu (bool) – Start the process without noise
class chainerrl.explorers.Boltzmann(T=1.0)[source]

Boltzmann exploration.

Parameters:T (float) – Temperature of Boltzmann distribution.
class chainerrl.explorers.ConstantEpsilonGreedy(epsilon, random_action_func, logger=<Logger chainerrl.explorers.epsilon_greedy (WARNING)>)[source]

Epsilon-greedy with constant epsilon.

Parameters:
  • epsilon – epsilon used
  • random_action_func – function with no argument that returns action
  • logger – logger used
class chainerrl.explorers.LinearDecayEpsilonGreedy(start_epsilon, end_epsilon, decay_steps, random_action_func, logger=<Logger chainerrl.explorers.epsilon_greedy (WARNING)>)[source]

Epsilon-greedy with linearly decayed epsilon

Parameters:
  • start_epsilon – max value of epsilon
  • end_epsilon – min value of epsilon
  • decay_steps – how many steps it takes for epsilon to decay
  • random_action_func – function with no argument that returns action
  • logger – logger used
class chainerrl.explorers.Greedy[source]

No exploration