Distributions

Distribution interfaces

class chainerrl.distribution.Distribution[source]

Batch of distributions of data.

copy(x)[source]

Copy a distribion unchained from the computation graph.

Returns:Distribution
entropy

Entropy of distributions.

Returns:chainer.Variable
kl

Compute KL divergence D_KL(P|Q).

Parameters:distrib (Distribution) – Distribution Q.
Returns:chainer.Variable
log_prob(x)[source]

Compute log p(x).

Returns:chainer.Variable
most_probable

Most probable data points.

Returns:chainer.Variable
params

Learnable parameters of this distribution.

Returns:tuple of chainer.Variable
prob(x)[source]

Compute p(x).

Returns:chainer.Variable
sample()[source]

Sample from distributions.

Returns:chainer.Variable
sample_with_log_prob()[source]

Do sample and log_prob at the same time.

This can be more efficient than calling sample and log_prob separately.

Returns:Samples. chainer.Variable: Log probability of the samples.
Return type:chainer.Variable

Distribution implementations

class chainerrl.distribution.GaussianDistribution(mean, var)[source]

Gaussian distribution.

class chainerrl.distribution.SoftmaxDistribution(logits, beta=1.0, min_prob=0.0)[source]

Softmax distribution.

Parameters:
  • logits (ndarray or chainer.Variable) – Logits for softmax distribution.
  • beta (float) – inverse of the temperature parameter of softmax distribution
  • min_prob (float) – minimum probability across all labels
class chainerrl.distribution.MellowmaxDistribution(values, omega=8.0)[source]

Maximum entropy mellowmax distribution.

See: http://arxiv.org/abs/1612.05628

Parameters:values (ndarray or chainer.Variable) – Values to apply mellowmax.
class chainerrl.distribution.ContinuousDeterministicDistribution(x)[source]

Continous deterministic distribution.

This distribution is supposed to be used in continuous deterministic policies.