Action values¶
Action value interfaces¶
Action value implementations¶
-
class
chainerrl.action_value.
DiscreteActionValue
(q_values, q_values_formatter=<function <lambda>>)[source]¶ Q-function output for discrete action space.
Parameters: q_values (ndarray or chainer.Variable) – Array of Q values whose shape is (batchsize, n_actions)
-
class
chainerrl.action_value.
QuadraticActionValue
(mu, mat, v, min_action=None, max_action=None)[source]¶ Q-function output for continuous action space.
See: http://arxiv.org/abs/1603.00748
Define a Q(s,a) with A(s,a) in a quadratic form.
Q(s,a) = V(s,a) + A(s,a) A(s,a) = -1/2 (u - mu(s))^T P(s) (u - mu(s))
Parameters: - mu (chainer.Variable) – mu(s), actions that maximize A(s,a)
- mat (chainer.Variable) – P(s), coefficient matrices of A(s,a). It must be positive definite.
- v (chainer.Variable) – V(s), values of s
- min_action (ndarray) – mininum action, not batched
- max_action (ndarray) – maximum action, not batched