Policies¶

Policy interfaces¶

class chainerrl.policy.Policy[source]¶

Abstract policy.

__call__(state)[source]¶

Evaluate a policy.

Returns:	Distribution of actions

Policy implementations¶

class chainerrl.policies.ContinuousDeterministicPolicy(model, model_call=None, action_filter=None)[source]¶

Continuous deterministic policy.

Parameters:	model (chainer.Link) – Link that is callable and outputs action values. model_call (callable or None) – Callable used instead of model.__call__ if not None action_filter (callable or None) – Callable applied to the outputs of the model if not None

class chainerrl.policies.FCDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected deterministic policy.

Parameters:

n_input_channels (int) – Number of input channels.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
action_size (int) – Size of actions.
min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
min_action – Minimum action. Used only if bound_action is set to True.
bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.policies.FCBNDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected deterministic policy with Batch Normalization.

Parameters:

n_input_channels (int) – Number of input channels.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
action_size (int) – Size of actions.
min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
min_action – Minimum action. Used only if bound_action is set to True.
bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
normalize_input (bool) – If set to True, Batch Normalization is applied to inputs as well as hidden activations.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.policies.FCLSTMDeterministicPolicy(n_input_channels, n_hidden_layers, n_hidden_channels, action_size, min_action=None, max_action=None, bound_action=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected deterministic policy with LSTM.

Parameters:

n_input_channels (int) – Number of input channels.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
action_size (int) – Size of actions.
min_action (ndarray or None) – Minimum action. Used only if bound_action is set to True.
min_action – Minimum action. Used only if bound_action is set to True.
bound_action (bool) – If set to True, actions are bounded to [min_action, max_action] by tanh.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.policies.FCGaussianPolicy(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_wscale=1, var_bias=0, min_var=0)[source]¶

Gaussian policy that consists of fully-connected layers.

This model has two output layers: the mean layer and the variance layer. The mean of the Gaussian is computed as follows:

Let y as the output of the mean layer. If bound_mean=False:

mean = y (if bound_mean=False)

If bound_mean=True:

mean = min_action + tanh(y) * (max_action - min_action) / 2

The variance of the Gaussian is computed as follows:: Let y as the output of the variance layer. variance = softplus(y) + min_var

Parameters:

n_input_channels (int) – Number of input channels.
action_size (int) – Number of dimensions of the action space.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
min_action (ndarray) – Minimum action. Used only when bound_mean=True.
max_action (ndarray) – Maximum action. Used only when bound_mean=True.
var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
nonlinearity (callable) – Nonlinearity placed between layers.
mean_wscale (float) – Scale of weight initialization of the mean layer.
var_wscale (float) – Scale of weight initialization of the variance layer.
var_bias (float) – The initial value of the bias parameter for the variance layer.
min_var (float) – Minimum value of the variance.

class chainerrl.policies.FCGaussianPolicyWithStateIndependentCovariance(n_input_channels, action_size, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, var_type='spherical', nonlinearity=<function relu>, mean_wscale=1, var_func=<function softplus>, var_param_init=0)[source]¶

Gaussian policy that consists of FC layers with parametrized covariance.

This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy.

Parameters:

n_input_channels (int) – Number of input channels.
action_size (int) – Number of dimensions of the action space.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
min_action (ndarray) – Minimum action. Used only when bound_mean=True.
max_action (ndarray) – Maximum action. Used only when bound_mean=True.
var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
nonlinearity (callable) – Nonlinearity placed between layers.
mean_wscale (float) – Scale of weight initialization of the mean layer.
var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
var_param_init (float) – Initial value the var parameter.

class chainerrl.policies.FCGaussianPolicyWithFixedCovariance(n_input_channels, action_size, var, n_hidden_layers=0, n_hidden_channels=None, min_action=None, max_action=None, bound_mean=False, nonlinearity=<function relu>, mean_wscale=1)[source]¶

Gaussian policy that consists of FC layers with fixed covariance.

This model has one output layers: the mean layer. The mean of the Gaussian is computed in the same way as FCGaussianPolicy. The variance of the Gaussian must be specified as an argument.

Parameters:

n_input_channels (int) – Number of input channels.
action_size (int) – Number of dimensions of the action space.
var (float or ndarray) – Variance of the Gaussian distribution.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_channels (int) – Number of hidden channels.
min_action (ndarray) – Minimum action. Used only when bound_mean=True.
max_action (ndarray) – Maximum action. Used only when bound_mean=True.
nonlinearity (callable) – Nonlinearity placed between layers.
mean_wscale (float) – Scale of weight initialization of the mean layer.

class chainerrl.policies.GaussianHeadWithStateIndependentCovariance(action_size, var_type='spherical', var_func=<function softplus>, var_param_init=0)[source]¶

Gaussian head with state-independent learned covariance.

This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a state-independent way.

State-independent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.

Parameters:	action_size (int) – Number of dimensions of the action space. var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’. var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values. var_param_init (float) – Initial value the var parameter.

class chainerrl.policies.MellowmaxPolicy(model, omega=1.0)[source]¶

Mellowmax policy.

See: http://arxiv.org/abs/1612.05628

Parameters:	model (chainer.Link) – Link that is callable and outputs action values. omega (float) – Parameter of the mellowmax function.

class chainerrl.policies.SoftmaxPolicy(model, beta=1.0, min_prob=0.0)[source]¶

Softmax policy that uses Boltzmann distributions.

Parameters:	model (chainer.Link) – Link that is callable and outputs action values. beta (float) – Parameter of Boltzmann distributions.

class chainerrl.policies.FCSoftmaxPolicy(n_input_channels, n_actions, n_hidden_layers=0, n_hidden_channels=None, beta=1.0, nonlinearity=<function relu>, last_wscale=1.0, min_prob=0.0)[source]¶: Softmax policy that consists of FC layers and rectifiers