DeepMind: Q-learning is difficult to apply when the number of available actions is large. We show that a simple extension based on amortized stochastic search allows Q-learning to scale to high-dimensional discrete, continuous or hybrid action spaces: https://arxiv.org/abs/2001.08116
6 replies, 916 likes
David Warde-Farley 🇪🇺: Very glad to share this on arXiv today: one weird trick for getting Q-learning to work when the action space is big and complicated.
Work led by Tom Van de Wiele, with Andriy Mnih and @VladMnih.
2 replies, 192 likes
Ian Osband: "One weird trick" for DQN in large (continuous) action spaces:
- Initialize uniform action-sampling distribution.
- Choose sampled action with highest Q.
- Train sampling to produce "best action" + also some entropy.
- ... Works surprisingly well!
Great stuff @dwf, @VladMnih !
1 replies, 91 likes
Arash Tavakoli: If you like RL in large action spaces as much as I do, then you will likely enjoy this work from @DeepMind!
By Van de Wiele, @dwf, @AndriyMnih & @VladMnih.
0 replies, 7 likes
Daisuke Okanohara: Q-learning needs a maximization over actions and cannot be applied to high-dimensional/continuous action space. With a proposal distribution trained by amortized inference, Q-learning can be used to these problems and outperform other SOTAs. https://arxiv.org/abs/2001.08116
0 replies, 6 likes
Patrick Muncaster: Q-LEARNING IN ENORMOUS ACTION SPACES VIA
AMORTIZED APPROXIMATE MAXIMIZATION
https://arxiv.org/pdf/2001.08116.pdf " We treat the search
for the best action as another learning problem & replace the exact maximization over all actions with a maximization over a set of actions sampled from a ...
0 replies, 1 likes
Benjamin Singleton: Q-Learning in enormous action spaces via amortized approximate maximization #BigData #Analytics https://arxiv.org/abs/2001.08116
0 replies, 1 likes
Found on Jan 23 2020 at https://arxiv.org/pdf/2001.08116.pdf