Distributions¶

Distribution interfaces¶

class chainerrl.distribution.Distribution[source]¶

Batch of distributions of data.

Copy a distribion unchained from the computation graph.

Returns:	Distribution

entropy¶

Entropy of distributions.

Returns:	chainer.Variable

kl¶

Compute KL divergence D_KL(P|Q).

Parameters:	distrib (Distribution) – Distribution Q.
Returns:	chainer.Variable

log_prob(x)[source]¶

Compute log p(x).

Returns:	chainer.Variable

most_probable¶

Most probable data points.

Returns:	chainer.Variable

params¶

Learnable parameters of this distribution.

Returns:	tuple of chainer.Variable

Compute p(x).

Returns:	chainer.Variable

Sample from distributions.

Returns:	chainer.Variable

sample_with_log_prob()[source]¶

Do sample and log_prob at the same time.

This can be more efficient than calling sample and log_prob separately.

Returns:	Samples. chainer.Variable: Log probability of the samples.
Return type:	chainer.Variable

class chainerrl.distribution.GaussianDistribution(mean, var)[source]¶: Gaussian distribution.

class chainerrl.distribution.SoftmaxDistribution(logits, beta=1.0, min_prob=0.0)[source]¶

Softmax distribution.

Parameters:	logits (ndarray or chainer.Variable) – Logits for softmax distribution. beta (float) – inverse of the temperature parameter of softmax distribution min_prob (float) – minimum probability across all labels

class chainerrl.distribution.MellowmaxDistribution(values, omega=8.0)[source]¶

Maximum entropy mellowmax distribution.

Parameters:	values (ndarray or chainer.Variable) – Values to apply mellowmax.

class chainerrl.distribution.ContinuousDeterministicDistribution(x)[source]¶

Continous deterministic distribution.

This distribution is supposed to be used in continuous deterministic policies.