Distributions

Distribution interfaces

class chainerrl.distribution.Distribution[source]

Batch of distributions of data.

copy(x)[source]

Copy a distribion unchained from the computation graph.

Returns:Distribution
entropy

Entropy of distributions.

Returns:chainer.Variable
kl

Compute KL divergence D_KL(P|Q).

Parameters:distrib (Distribution) – Distribution Q.
Returns:chainer.Variable
log_prob(x)[source]

Compute log p(x).

Returns:chainer.Variable
most_probable

Most probable data points.

Returns:chainer.Variable
params

Learnable parameters of this distribution.

Returns:tuple of chainer.Variable
prob(x)[source]

Compute p(x).

Returns:chainer.Variable
sample()[source]

Sample from distributions.

Returns:chainer.Variable

Distribution implementations

class chainerrl.distribution.GaussianDistribution(mean, var)[source]

Gaussian distribution.

class chainerrl.distribution.SoftmaxDistribution(logits, beta=1.0, min_prob=0.0)[source]

Softmax distribution.

Parameters:logits (ndarray or chainer.Variable) – Logits for softmax distribution.
class chainerrl.distribution.MellowmaxDistribution(values, omega=8.0)[source]

Maximum entropy mellowmax distribution.

See: http://arxiv.org/abs/1612.05628

Parameters:values (ndarray or chainer.Variable) – Values to apply mellowmax.
class chainerrl.distribution.ContinuousDeterministicDistribution(x)[source]

Continous deterministic distribution.

This distribution is supposed to be used in continuous deterministic policies.