Making Efficient Use of Demonstrations to Solve Hard Exploration Problems


Sep 05 2019 DeepMind

R2D3 uses demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. It can solve tasks where SOTA methods fail to see a single reward. Paper: Videos & more:
Sep 05 2019 Nando de Freitas

Advancing the frontier of what deep RL and imitation methods can achieve. Congratulations @caglarml @TomLePaine and the many other brilliant researchers and engineers at @DeepMindAI who worked on this ambitious project for many months.
Sep 05 2019 Misha Denil

Great work on learning from demonstrations by @caglarml and @TomLePaine
Sep 05 2019 Serkan Cabi

One my favorite RL agents of the year. Surpassed my expectations on some really hard tasks.
Sep 05 2019

They also introduce a suite of 8 tasks that combine these three properties, and show that #R2D3 can solve several of the tasks where other state of the art methods fail to see even a single successful trajectory after tens of billions of steps of exploration.
