Policies¶
Policy interfaces¶
Policy implementations¶

class
chainerrl.policies.
ContinuousDeterministicPolicy
(model, model_call=None, action_filter=None)[source]¶ Continuous deterministic policy.
Parameters:

class
chainerrl.policies.
FCDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected deterministic policy.
Parameters:  n_input_channels (int) – Number of input channels.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 action_size (int) – Size of actions.
 min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
 min_action – Minimum action. Used only if bound_action is set to True.
 bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
chainerrl.policies.
FCBNDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected deterministic policy with Batch Normalization.
Parameters:  n_input_channels (int) – Number of input channels.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 action_size (int) – Size of actions.
 min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
 min_action – Minimum action. Used only if bound_action is set to True.
 bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
 normalize_input (bool) – If set to True, Batch Normalization is applied to inputs as well as hidden activations.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
chainerrl.policies.
FCLSTMDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected deterministic policy with LSTM.
Parameters:  n_input_channels (int) – Number of input channels.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 action_size (int) – Size of actions.
 min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
 min_action – Minimum action. Used only if bound_action is set to True.
 bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
chainerrl.policies.
FCGaussianPolicy
(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_wscale=1, var_bias=0, min_var=0)[source]¶ Gaussian policy that consists of fullyconnected layers.
This model has two output layers: the mean layer and the variance layer. The mean of the Gaussian is computed as follows:
Let y as the output of the mean layer. If bound_mean=False:
mean = y (if bound_mean=False) If bound_mean=True:
 mean = min_action + tanh(y) * (max_action  min_action) / 2
 The variance of the Gaussian is computed as follows:
 Let y as the output of the variance layer. variance = softplus(y) + min_var
Parameters:  n_input_channels (int) – Number of input channels.
 action_size (int) – Number of dimensions of the action space.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 min_action (ndarray) – Minimum action. Used only when bound_mean=True.
 max_action (ndarray) – Maximum action. Used only when bound_mean=True.
 var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
 nonlinearity (callable) – Nonlinearity placed between layers.
 mean_wscale (float) – Scale of weight initialization of the mean layer.
 var_wscale (float) – Scale of weight initialization of the variance layer.
 var_bias (float) – The initial value of the bias parameter for the variance layer.
 min_var (float) – Minimum value of the variance.

class
chainerrl.policies.
FCGaussianPolicyWithStateIndependentCovariance
(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_func=<function softplus>, var_param_init=0)[source]¶ Gaussian policy that consists of FC layers with parametrized covariance.
This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy.
Parameters:  n_input_channels (int) – Number of input channels.
 action_size (int) – Number of dimensions of the action space.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 min_action (ndarray) – Minimum action. Used only when bound_mean=True.
 max_action (ndarray) – Maximum action. Used only when bound_mean=True.
 var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
 nonlinearity (callable) – Nonlinearity placed between layers.
 mean_wscale (float) – Scale of weight initialization of the mean layer.
 var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
 var_param_init (float) – Initial value the var parameter.

class
chainerrl.policies.
FCGaussianPolicyWithFixedCovariance
(n_input_channels, action_size, var, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, nonlinearity=<function relu>, mean_wscale=1)[source]¶ Gaussian policy that consists of FC layers with fixed covariance.
This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy. The variance of the Gaussian must be specified as an argument.
Parameters:  n_input_channels (int) – Number of input channels.
 action_size (int) – Number of dimensions of the action space.
 var (float or ndarray) – Variance of the Gaussian distribution.
 n_hidden_layers (int) – Number of hidden layers.
 n_hidden_channels (int) – Number of hidden channels.
 min_action (ndarray) – Minimum action. Used only when bound_mean=True.
 max_action (ndarray) – Maximum action. Used only when bound_mean=True.
 nonlinearity (callable) – Nonlinearity placed between layers.
 mean_wscale (float) – Scale of weight initialization of the mean layer.

class
chainerrl.policies.
GaussianHeadWithStateIndependentCovariance
(action_size, var_type='spherical', var_func=<function softplus>, var_param_init=0)[source]¶ Gaussian head with stateindependent learned covariance.
This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a stateindependent way.
Stateindependent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.
Parameters:  action_size (int) – Number of dimensions of the action space.
 var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
 var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
 var_param_init (float) – Initial value the var parameter.

class
chainerrl.policies.
MellowmaxPolicy
(model, omega=1.0)[source]¶ Mellowmax policy.
See: http://arxiv.org/abs/1612.05628
Parameters:  model (chainer.Link) – Link that is callable and outputs action values.
 omega (float) – Parameter of the mellowmax function.