Policies

Policy interfaces

class chainerrl.policy.Policy[source]

Abstract policy.

__call__(state)[source]

Evaluate a policy.

Returns:Distribution of actions

Policy implementations

class chainerrl.policies.ContinuousDeterministicPolicy(model, model_call=None, action_filter=None)[source]

Continuous deterministic policy.

Parameters:
  • model (chainer.Link) – Link that is callable and outputs action values.
  • model_call (callable or None) – Callable used instead of model.__call__ if not None
  • action_filter (callable or None) – Callable applied to the outputs of the model if not None
class chainerrl.policies.FCDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected deterministic policy.

Parameters:
  • n_input_channels (int) – Number of input channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • action_size (int) – Size of actions.
  • min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
  • min_action – Minimum action. Used only if bound_action is set to True.
  • bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.policies.FCBNDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected deterministic policy with Batch Normalization.

Parameters:
  • n_input_channels (int) – Number of input channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • action_size (int) – Size of actions.
  • min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
  • min_action – Minimum action. Used only if bound_action is set to True.
  • bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
  • normalize_input (bool) – If set to True, Batch Normalization is applied to inputs as well as hidden activations.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.policies.FCLSTMDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected deterministic policy with LSTM.

Parameters:
  • n_input_channels (int) – Number of input channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • action_size (int) – Size of actions.
  • min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
  • min_action – Minimum action. Used only if bound_action is set to True.
  • bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.policies.FCGaussianPolicy(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_wscale=1, var_bias=0, min_var=0)[source]

Gaussian policy that consists of fully-connected layers.

This model has two output layers: the mean layer and the variance layer. The mean of the Gaussian is computed as follows:

Let y as the output of the mean layer. If bound_mean=False:

mean = y (if bound_mean=False)
If bound_mean=True:
mean = min_action + tanh(y) * (max_action - min_action) / 2
The variance of the Gaussian is computed as follows:
Let y as the output of the variance layer. variance = softplus(y) + min_var
Parameters:
  • n_input_channels (int) – Number of input channels.
  • action_size (int) – Number of dimensions of the action space.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • min_action (ndarray) – Minimum action. Used only when bound_mean=True.
  • max_action (ndarray) – Maximum action. Used only when bound_mean=True.
  • var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
  • nonlinearity (callable) – Nonlinearity placed between layers.
  • mean_wscale (float) – Scale of weight initialization of the mean layer.
  • var_wscale (float) – Scale of weight initialization of the variance layer.
  • var_bias (float) – The initial value of the bias parameter for the variance layer.
  • min_var (float) – Minimum value of the variance.
class chainerrl.policies.FCGaussianPolicyWithStateIndependentCovariance(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_func=<function softplus>, var_param_init=0)[source]

Gaussian policy that consists of FC layers with parametrized covariance.

This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy.

Parameters:
  • n_input_channels (int) – Number of input channels.
  • action_size (int) – Number of dimensions of the action space.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • min_action (ndarray) – Minimum action. Used only when bound_mean=True.
  • max_action (ndarray) – Maximum action. Used only when bound_mean=True.
  • var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
  • nonlinearity (callable) – Nonlinearity placed between layers.
  • mean_wscale (float) – Scale of weight initialization of the mean layer.
  • var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
  • var_param_init (float) – Initial value the var parameter.
class chainerrl.policies.FCGaussianPolicyWithFixedCovariance(n_input_channels, action_size, var, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, nonlinearity=<function relu>, mean_wscale=1)[source]

Gaussian policy that consists of FC layers with fixed covariance.

This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy. The variance of the Gaussian must be specified as an argument.

Parameters:
  • n_input_channels (int) – Number of input channels.
  • action_size (int) – Number of dimensions of the action space.
  • var (float or ndarray) – Variance of the Gaussian distribution.
  • n_hidden_layers (int) – Number of hidden layers.
  • n_hidden_channels (int) – Number of hidden channels.
  • min_action (ndarray) – Minimum action. Used only when bound_mean=True.
  • max_action (ndarray) – Maximum action. Used only when bound_mean=True.
  • nonlinearity (callable) – Nonlinearity placed between layers.
  • mean_wscale (float) – Scale of weight initialization of the mean layer.
class chainerrl.policies.GaussianHeadWithStateIndependentCovariance(action_size, var_type='spherical', var_func=<function softplus>, var_param_init=0)[source]

Gaussian head with state-independent learned covariance.

This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a state-independent way.

State-independent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.

Parameters:
  • action_size (int) – Number of dimensions of the action space.
  • var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
  • var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
  • var_param_init (float) – Initial value the var parameter.
class chainerrl.policies.MellowmaxPolicy(model, omega=1.0)[source]

Mellowmax policy.

See: http://arxiv.org/abs/1612.05628

Parameters:
  • model (chainer.Link) – Link that is callable and outputs action values.
  • omega (float) – Parameter of the mellowmax function.
class chainerrl.policies.SoftmaxPolicy(model, beta=1.0, min_prob=0.0)[source]

Softmax policy that uses Boltzmann distributions.

Parameters:
  • model (chainer.Link) – Link that is callable and outputs action values.
  • beta (float) – Parameter of Boltzmann distributions.
class chainerrl.policies.FCSoftmaxPolicy(n_input_channels, n_actions, n_hidden_layers=0, n_hidden_channels=None, beta=1.0, nonlinearity=<function relu>, last_wscale=1.0, min_prob=0.0)[source]

Softmax policy that consists of FC layers and rectifiers