Policies¶
Policy interfaces¶
Policy implementations¶
-
class
chainerrl.policies.
ContinuousDeterministicPolicy
(model, model_call=None, action_filter=None)[source]¶ Continuous deterministic policy.
Parameters:
-
class
chainerrl.policies.
FCDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected deterministic policy.
Parameters: - n_input_channels (int) – Number of input channels.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- action_size (int) – Size of actions.
- min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
- min_action – Minimum action. Used only if bound_action is set to True.
- bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
chainerrl.policies.
FCBNDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected deterministic policy with Batch Normalization.
Parameters: - n_input_channels (int) – Number of input channels.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- action_size (int) – Size of actions.
- min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
- min_action – Minimum action. Used only if bound_action is set to True.
- bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
- normalize_input (bool) – If set to True, Batch Normalization is applied to inputs as well as hidden activations.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
chainerrl.policies.
FCLSTMDeterministicPolicy
(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected deterministic policy with LSTM.
Parameters: - n_input_channels (int) – Number of input channels.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- action_size (int) – Size of actions.
- min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
- min_action – Minimum action. Used only if bound_action is set to True.
- bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
chainerrl.policies.
FCGaussianPolicy
(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_wscale=1, var_bias=0, min_var=0)[source]¶ Gaussian policy that consists of fully-connected layers.
This model has two output layers: the mean layer and the variance layer. The mean of the Gaussian is computed as follows:
Let y as the output of the mean layer. If bound_mean=False:
mean = y (if bound_mean=False)- If bound_mean=True:
- mean = min_action + tanh(y) * (max_action - min_action) / 2
- The variance of the Gaussian is computed as follows:
- Let y as the output of the variance layer. variance = softplus(y) + min_var
Parameters: - n_input_channels (int) – Number of input channels.
- action_size (int) – Number of dimensions of the action space.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- min_action (ndarray) – Minimum action. Used only when bound_mean=True.
- max_action (ndarray) – Maximum action. Used only when bound_mean=True.
- var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
- nonlinearity (callable) – Nonlinearity placed between layers.
- mean_wscale (float) – Scale of weight initialization of the mean layer.
- var_wscale (float) – Scale of weight initialization of the variance layer.
- var_bias (float) – The initial value of the bias parameter for the variance layer.
- min_var (float) – Minimum value of the variance.
-
class
chainerrl.policies.
FCGaussianPolicyWithStateIndependentCovariance
(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_func=<function softplus>, var_param_init=0)[source]¶ Gaussian policy that consists of FC layers with parametrized covariance.
This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy.
Parameters: - n_input_channels (int) – Number of input channels.
- action_size (int) – Number of dimensions of the action space.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- min_action (ndarray) – Minimum action. Used only when bound_mean=True.
- max_action (ndarray) – Maximum action. Used only when bound_mean=True.
- var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
- nonlinearity (callable) – Nonlinearity placed between layers.
- mean_wscale (float) – Scale of weight initialization of the mean layer.
- var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
- var_param_init (float) – Initial value the var parameter.
-
class
chainerrl.policies.
FCGaussianPolicyWithFixedCovariance
(n_input_channels, action_size, var, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, nonlinearity=<function relu>, mean_wscale=1)[source]¶ Gaussian policy that consists of FC layers with fixed covariance.
This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy. The variance of the Gaussian must be specified as an argument.
Parameters: - n_input_channels (int) – Number of input channels.
- action_size (int) – Number of dimensions of the action space.
- var (float or ndarray) – Variance of the Gaussian distribution.
- n_hidden_layers (int) – Number of hidden layers.
- n_hidden_channels (int) – Number of hidden channels.
- min_action (ndarray) – Minimum action. Used only when bound_mean=True.
- max_action (ndarray) – Maximum action. Used only when bound_mean=True.
- nonlinearity (callable) – Nonlinearity placed between layers.
- mean_wscale (float) – Scale of weight initialization of the mean layer.
-
class
chainerrl.policies.
GaussianHeadWithStateIndependentCovariance
(action_size, var_type='spherical', var_func=<function softplus>, var_param_init=0)[source]¶ Gaussian head with state-independent learned covariance.
This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a state-independent way.
State-independent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.
Parameters: - action_size (int) – Number of dimensions of the action space.
- var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
- var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
- var_param_init (float) – Initial value the var parameter.
-
class
chainerrl.policies.
MellowmaxPolicy
(model, omega=1.0)[source]¶ Mellowmax policy.
See: http://arxiv.org/abs/1612.05628
Parameters: - model (chainer.Link) – Link that is callable and outputs action values.
- omega (float) – Parameter of the mellowmax function.