Action values

Action value interfaces

class chainerrl.action_value.ActionValue[source]

Struct that holds state-fixed Q-functions and its subproducts.

Every operation it supports is done in a batch manner.


Evaluate Q(s,a) with a = given actions.


Get argmax_a Q(s,a).


Evaluate max Q(s,a).

Action value implementations

class chainerrl.action_value.DiscreteActionValue(q_values, q_values_formatter=<function <lambda>>)[source]

Q-function output for discrete action space.

Parameters:q_values (ndarray or chainer.Variable) – Array of Q values whose shape is (batchsize, n_actions)
class chainerrl.action_value.QuadraticActionValue(mu, mat, v, min_action=None, max_action=None)[source]

Q-function output for continuous action space.


Define a Q(s,a) with A(s,a) in a quadratic form.

Q(s,a) = V(s,a) + A(s,a) A(s,a) = -1/2 (u - mu(s))^T P(s) (u - mu(s))

  • mu (chainer.Variable) – mu(s), actions that maximize A(s,a)
  • mat (chainer.Variable) – P(s), coefficient matrices of A(s,a). It must be positive definite.
  • v (chainer.Variable) – V(s), values of s
  • min_action (ndarray) – mininum action, not batched
  • max_action (ndarray) – maximum action, not batched
class chainerrl.action_value.SingleActionValue(evaluator, maximizer=None)[source]

ActionValue that can evaluate only a single action.