Replay Buffers

ReplayBuffer interfaces

class chainerrl.replay_buffer.ReplayBuffer(capacity=None, num_steps=1)[source]

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:
  • capacity (int) – capacity in terms of number of transitions
  • num_steps (int) – Number of timesteps per stored transition (for N-step updates)
append(state, action, reward, next_state=None, next_action=None, is_state_terminal=False, env_id=0, **kwargs)[source]

Append a transition to this replay buffer.

Parameters:
  • state – s_t
  • action – a_t
  • reward – r_t
  • next_state – s_{t+1} (can be None if terminal)
  • next_action – a_{t+1} (can be None for off-policy algorithms)
  • is_state_terminal (bool) –
  • env_id (object) – Object that is unique to each env. It indicates which env a given transition came from in multi-env training.
  • **kwargs – Any other information to store.
load(filename)[source]

Load the content of the buffer from a file.

Parameters:filename (str) – Path to a file.
sample(num_experiences)[source]

Sample n unique transitions from this replay buffer.

Parameters:n (int) – Number of transitions to sample.
Returns:Sequence of n sampled transitions.
save(filename)[source]

Save the content of the buffer to a file.

Parameters:filename (str) – Path to a file.

ReplayBuffer implementations

class chainerrl.replay_buffer.EpisodicReplayBuffer(capacity=None)[source]
class chainerrl.replay_buffer.ReplayBuffer(capacity=None, num_steps=1)[source]

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:
  • capacity (int) – capacity in terms of number of transitions
  • num_steps (int) – Number of timesteps per stored transition (for N-step updates)
class chainerrl.replay_buffer.PrioritizedReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=0.01, normalize_by_max=True, error_min=0, error_max=1, num_steps=1)[source]

Stochastic Prioritization

https://arxiv.org/pdf/1511.05952.pdf Section 3.3 proportional prioritization

Parameters:
  • capacity (int) – capacity in terms of number of transitions
  • alpha (float) – Exponent of errors to compute probabilities to sample
  • beta0 (float) – Initial value of beta
  • betasteps (int) – Steps to anneal beta to 1
  • eps (float) – To revisit a step after its error becomes near zero
  • normalize_by_max (bool) – Method to normalize weights. 'batch' or True (default): divide by the maximum weight in the sampled batch. 'memory': divide by the maximum weight in the memory. False: do not normalize
class chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=1e-08, normalize_by_max=True, default_priority_func=None, uniform_ratio=0, wait_priority_after_sampling=True, return_sample_weights=True, error_min=None, error_max=None)[source]