Q-functions¶

Q-function interfaces¶

class chainerrl.q_function.StateQFunction[source]¶

Abstract Q-function with state input.

__call__(x)[source]¶

Evaluates Q-function

Parameters:	x (ndarray) – state input
Returns:	An instance of ActionValue that allows to calculate the Q-values for state x and every possible action

class chainerrl.q_function.StateActionQFunction[source]¶

Abstract Q-function with state and action input.

__call__(x, a)[source]¶

Evaluates Q-function

Parameters:	x (ndarray) – state input a (ndarray) – action input
Returns:	Q-value for state x and action a

Q-function implementations¶

class chainerrl.q_functions.DuelingDQN(n_actions, n_input_channels=4, activation=<function relu>, bias=0.1)[source]¶

Dueling Q-Network

See: http://arxiv.org/abs/1511.06581

class chainerrl.q_functions.DistributionalDuelingDQN(n_actions, n_atoms, v_min, v_max, n_input_channels=4, activation=<function relu>, bias=0.1)[source]¶: Distributional dueling fully-connected Q-function with discrete actions.

class chainerrl.q_functions.SingleModelStateQFunctionWithDiscreteAction(model)[source]¶

Q-function with discrete actions.

Parameters:	model (chainer.Link) – Link that is callable and outputs action values.

class chainerrl.q_functions.FCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected state-input Q-function with discrete actions.

Parameters:	n_dim_obs – number of dimensions of observation space n_actions (int) – Number of actions in action space. n_hidden_channels – number of hidden channels n_hidden_layers – number of hidden layers nonlinearity (callable) – Nonlinearity applied after each hidden layer. last_wscale (float) – Weight scale of the last layer.

class chainerrl.q_functions.DistributionalSingleModelStateQFunctionWithDiscreteAction(model, z_values)[source]¶

Distributional Q-function with discrete actions.

Parameters:	model (chainer.Link) – Link that is callable and outputs atoms for each action. z_values (ndarray) – Returns represented by atoms. Its shape must be (n_atoms,).

class chainerrl.q_functions.DistributionalFCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_atoms, v_min, v_max, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Distributional fully-connected Q-function with discrete actions.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_actions (int) – Number of actions in action space.
n_atoms (int) – Number of atoms of return distribution.
v_min (float) – Minimum value this model can approximate.
v_max (float) – Maximum value this model can approximate.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity applied after each hidden layer.
last_wscale (float) – Weight scale of the last layer.

class chainerrl.q_functions.FCLSTMStateQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers)[source]¶

Fully-connected + LSTM state-input discrete Q-function.

Parameters:	n_dim_obs – number of dimensions of observation space n_dim_action – number of dimensions of action space n_hidden_channels – number of hidden channels before LSTM n_hidden_layers – number of hidden layers before LSTM

class chainerrl.q_functions.FCQuadraticStateQFunction(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True)[source]¶

Fully-connected state-input continuous Q-function.

See: https://arxiv.org/abs/1603.00748

Parameters:	n_input_channels – number of input channels n_dim_action – number of dimensions of action space n_hidden_channels – number of hidden channels n_hidden_layers – number of hidden layers action_space – action_space scale_mu (bool) – scale mu by applying tanh if True

class chainerrl.q_functions.FCBNQuadraticStateQFunction(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True, normalize_input=True)[source]¶

Fully-connected + BN state-input continuous Q-function.

See: https://arxiv.org/abs/1603.00748

Parameters:

n_input_channels – number of input channels
n_dim_action – number of dimensions of action space
n_hidden_channels – number of hidden channels
n_hidden_layers – number of hidden layers
action_space – action_space
scale_mu (bool) – scale mu by applying tanh if True
normalize_input (bool) – If set to True, Batch Normalization is applied to the observations

class chainerrl.q_functions.SingleModelStateActionQFunction(model)[source]¶

Q-function with discrete actions.

Parameters:	model (chainer.Link) – Link that is callable and outputs action values.

class chainerrl.q_functions.FCSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.q_functions.FCLSTMSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + LSTM (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.q_functions.FCBNSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + BN (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
normalize_input (bool) – If set to True, Batch Normalization is applied to both observations and actions.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.q_functions.FCBNLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + BN (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
normalize_input (bool) – If set to True, Batch Normalization is applied
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.

class chainerrl.q_functions.FCLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.