Replay Buffers¶
ReplayBuffer interfaces¶
-
class
chainerrl.replay_buffer.
ReplayBuffer
(capacity=None, num_steps=1)[source]¶ Experience Replay Buffer
As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.
Parameters: -
append
(state, action, reward, next_state=None, next_action=None, is_state_terminal=False, env_id=0, **kwargs)[source]¶ Append a transition to this replay buffer.
Parameters: - state – s_t
- action – a_t
- reward – r_t
- next_state – s_{t+1} (can be None if terminal)
- next_action – a_{t+1} (can be None for off-policy algorithms)
- is_state_terminal (bool) –
- env_id (object) – Object that is unique to each env. It indicates which env a given transition came from in multi-env training.
- **kwargs – Any other information to store.
-
load
(filename)[source]¶ Load the content of the buffer from a file.
Parameters: filename (str) – Path to a file.
-
ReplayBuffer implementations¶
-
class
chainerrl.replay_buffer.
ReplayBuffer
(capacity=None, num_steps=1)[source] Experience Replay Buffer
As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.
Parameters:
-
class
chainerrl.replay_buffer.
PrioritizedReplayBuffer
(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=0.01, normalize_by_max=True, error_min=0, error_max=1, num_steps=1)[source]¶ Stochastic Prioritization
https://arxiv.org/pdf/1511.05952.pdf Section 3.3 proportional prioritization
Parameters: - capacity (int) – capacity in terms of number of transitions
- alpha (float) – Exponent of errors to compute probabilities to sample
- beta0 (float) – Initial value of beta
- betasteps (int) – Steps to anneal beta to 1
- eps (float) – To revisit a step after its error becomes near zero
- normalize_by_max (bool) – Method to normalize weights.
'batch'
orTrue
(default): divide by the maximum weight in the sampled batch.'memory'
: divide by the maximum weight in the memory.False
: do not normalize
-
class
chainerrl.replay_buffer.
PrioritizedEpisodicReplayBuffer
(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=1e-08, normalize_by_max=True, default_priority_func=None, uniform_ratio=0, wait_priority_after_sampling=True, return_sample_weights=True, error_min=None, error_max=None)[source]¶