Sorry, you need to enable JavaScript to visit this website.

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Citation Author(s):
Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, Enrico Rinaldi, Gianfranco Mauro, Daniela Sanchez Lopera, Michael Stephan, Thomas Stadelmayer, Avik Santra, Robert Wille
Submitted by:
Julius Ott
Last updated:
30 May 2023 - 3:08am
Document Type:
Poster
Document Year:
2023
Event:
Presenters:
Julius Ott
Paper Code:
https://github.com/juliusott/uncertainty-buffer
 

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task.
To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t.\ convergence and peak performance by 26\% on average.

up
0 users have voted: