Q-functions

Q-function interfaces

class chainerrl.q_function.StateQFunction[source]

Abstract Q-function with state input.

__call__(x)[source]

Evaluates Q-function

Parameters:x (ndarray) – state input
Returns:An instance of ActionValue that allows to calculate the Q-values for state x and every possible action
class chainerrl.q_function.StateActionQFunction[source]

Abstract Q-function with state and action input.

__call__(x, a)[source]

Evaluates Q-function

Parameters:
  • x (ndarray) – state input
  • a (ndarray) – action input
Returns:

Q-value for state x and action a

Q-function implementations

class chainerrl.q_functions.DuelingDQN(n_actions, n_input_channels=4, activation=<function relu>, bias=0.1)[source]

Dueling Q-Network

See: http://arxiv.org/abs/1511.06581

class chainerrl.q_functions.DistributionalDuelingDQN(n_actions, n_atoms, v_min, v_max, n_input_channels=4, activation=<function relu>, bias=0.1)[source]

Distributional dueling fully-connected Q-function with discrete actions.

class chainerrl.q_functions.SingleModelStateQFunctionWithDiscreteAction(model)[source]

Q-function with discrete actions.

Parameters:model (chainer.Link) – Link that is callable and outputs action values.
class chainerrl.q_functions.FCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected state-input Q-function with discrete actions.

Parameters:
  • n_dim_obs – number of dimensions of observation space
  • n_actions (int) – Number of actions in action space.
  • n_hidden_channels – number of hidden channels
  • n_hidden_layers – number of hidden layers
  • nonlinearity (callable) – Nonlinearity applied after each hidden layer.
  • last_wscale (float) – Weight scale of the last layer.
class chainerrl.q_functions.DistributionalSingleModelStateQFunctionWithDiscreteAction(model, z_values)[source]

Distributional Q-function with discrete actions.

Parameters:
  • model (chainer.Link) – Link that is callable and outputs atoms for each action.
  • z_values (ndarray) – Returns represented by atoms. Its shape must be (n_atoms,).
class chainerrl.q_functions.DistributionalFCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_atoms, v_min, v_max, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]

Distributional fully-connected Q-function with discrete actions.

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_actions (int) – Number of actions in action space.
  • n_atoms (int) – Number of atoms of return distribution.
  • v_min (float) – Minimum value this model can approximate.
  • v_max (float) – Maximum value this model can approximate.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • nonlinearity (callable) – Nonlinearity applied after each hidden layer.
  • last_wscale (float) – Weight scale of the last layer.
class chainerrl.q_functions.FCLSTMStateQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers)[source]

Fully-connected + LSTM state-input discrete Q-function.

Parameters:
  • n_dim_obs – number of dimensions of observation space
  • n_dim_action – number of dimensions of action space
  • n_hidden_channels – number of hidden channels before LSTM
  • n_hidden_layers – number of hidden layers before LSTM
class chainerrl.q_functions.FCQuadraticStateQFunction(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True)[source]

Fully-connected state-input continuous Q-function.

See: https://arxiv.org/abs/1603.00748

Parameters:
  • n_input_channels – number of input channels
  • n_dim_action – number of dimensions of action space
  • n_hidden_channels – number of hidden channels
  • n_hidden_layers – number of hidden layers
  • action_space – action_space
  • scale_mu (bool) – scale mu by applying tanh if True
class chainerrl.q_functions.FCBNQuadraticStateQFunction(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True, normalize_input=True)[source]

Fully-connected + BN state-input continuous Q-function.

See: https://arxiv.org/abs/1603.00748

Parameters:
  • n_input_channels – number of input channels
  • n_dim_action – number of dimensions of action space
  • n_hidden_channels – number of hidden channels
  • n_hidden_layers – number of hidden layers
  • action_space – action_space
  • scale_mu (bool) – scale mu by applying tanh if True
  • normalize_input (bool) – If set to True, Batch Normalization is applied to the observations
class chainerrl.q_functions.SingleModelStateActionQFunction(model)[source]

Q-function with discrete actions.

Parameters:model (chainer.Link) – Link that is callable and outputs action values.
class chainerrl.q_functions.FCSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected (s,a)-input Q-function.

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_dim_action (int) – Number of dimensions of action space.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.q_functions.FCLSTMSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected + LSTM (s,a)-input Q-function.

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_dim_action (int) – Number of dimensions of action space.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.q_functions.FCBNSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected + BN (s,a)-input Q-function.

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_dim_action (int) – Number of dimensions of action space.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers.
  • normalize_input (bool) – If set to True, Batch Normalization is applied to both observations and actions.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.q_functions.FCBNLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected + BN (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_dim_action (int) – Number of dimensions of action space.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
  • normalize_input (bool) – If set to True, Batch Normalization is applied
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
  • last_wscale (float) – Scale of weight initialization of the last layer.
class chainerrl.q_functions.FCLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]

Fully-connected (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:
  • n_dim_obs (int) – Number of dimensions of observation space.
  • n_dim_action (int) – Number of dimensions of action space.
  • n_hidden_channels (int) – Number of hidden channels.
  • n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
  • nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
  • last_wscale (float) – Scale of weight initialization of the last layer.