Replay Buffers¶

ReplayBuffer interfaces¶

class chainerrl.replay_buffer.ReplayBuffer(capacity=None, num_steps=1)[source]¶

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:	capacity (int) – capacity in terms of number of transitions num_steps (int) – Number of timesteps per stored transition (for N-step updates)

append(state, action, reward, next_state=None, next_action=None, is_state_terminal=False, env_id=0, **kwargs)[source]¶

Append a transition to this replay buffer.

Parameters:	state – s_t action – a_t reward – r_t next_state – s_{t+1} (can be None if terminal) next_action – a_{t+1} (can be None for off-policy algorithms) is_state_terminal (bool) – env_id (object) – Object that is unique to each env. It indicates which env a given transition came from in multi-env training. **kwargs – Any other information to store.

load(filename)[source]¶

Load the content of the buffer from a file.

Parameters:	filename (str) – Path to a file.

sample(num_experiences)[source]¶

Sample n unique transitions from this replay buffer.

Parameters:	n (int) – Number of transitions to sample.
Returns:	Sequence of n sampled transitions.

save(filename)[source]¶

Save the content of the buffer to a file.

Parameters:	filename (str) – Path to a file.

ReplayBuffer implementations¶

class chainerrl.replay_buffer.EpisodicReplayBuffer(capacity=None)[source]¶

class chainerrl.replay_buffer.ReplayBuffer(capacity=None, num_steps=1)[source]

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:	capacity (int) – capacity in terms of number of transitions num_steps (int) – Number of timesteps per stored transition (for N-step updates)

class chainerrl.replay_buffer.PrioritizedReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=0.01, normalize_by_max=True, error_min=0, error_max=1, num_steps=1)[source]¶

Stochastic Prioritization

https://arxiv.org/pdf/1511.05952.pdf Section 3.3 proportional prioritization

Parameters:

capacity (int) – capacity in terms of number of transitions
alpha (float) – Exponent of errors to compute probabilities to sample
beta0 (float) – Initial value of beta
betasteps (int) – Steps to anneal beta to 1
eps (float) – To revisit a step after its error becomes near zero
normalize_by_max (bool) – Method to normalize weights. 'batch' or True (default): divide by the maximum weight in the sampled batch. 'memory': divide by the maximum weight in the memory. False: do not normalize

class chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=1e-08, normalize_by_max=True, default_priority_func=None, uniform_ratio=0, wait_priority_after_sampling=True, return_sample_weights=True, error_min=None, error_max=None)[source]¶