Action values

Action value interfaces

class chainerrl.action_value.ActionValue[source]

Struct that holds state-fixed Q-functions and its subproducts.

Every operation it supports is done in a batch manner.

evaluate_actions(actions)[source]

Evaluate Q(s,a) with a = given actions.

greedy_actions

Get argmax_a Q(s,a).

max

Evaluate max Q(s,a).

params

Learnable parameters of this action value.

Returns:tuple of chainer.Variable

Action value implementations

class chainerrl.action_value.DiscreteActionValue(q_values, q_values_formatter=<function DiscreteActionValue.<lambda>>)[source]

Q-function output for discrete action space.

Parameters:q_values (ndarray or chainer.Variable) – Array of Q values whose shape is (batchsize, n_actions)
class chainerrl.action_value.QuadraticActionValue(mu, mat, v, min_action=None, max_action=None)[source]

Q-function output for continuous action space.

See: http://arxiv.org/abs/1603.00748

Define a Q(s,a) with A(s,a) in a quadratic form.

Q(s,a) = V(s,a) + A(s,a) A(s,a) = -1/2 (u - mu(s))^T P(s) (u - mu(s))

Parameters:
  • mu (chainer.Variable) – mu(s), actions that maximize A(s,a)
  • mat (chainer.Variable) – P(s), coefficient matrices of A(s,a). It must be positive definite.
  • v (chainer.Variable) – V(s), values of s
  • min_action (ndarray) – mininum action, not batched
  • max_action (ndarray) – maximum action, not batched
class chainerrl.action_value.SingleActionValue(evaluator, maximizer=None)[source]

ActionValue that can evaluate only a single action.