Max Jaderberg: Finally, Transformers working for RL! Two simple modifications: move layer-norm and add gating creates GTrXL: an incredibly stable and effective architecture for integrating experience through time in RL.
Great work from Emilio interning at @DeepMindAI https://arxiv.org/abs/1910.06764 https://t.co/uQQlvPpBbX
10 replies, 794 likes
Sid Jayakumar: Really excited about our latest work showing that large Transformer-XLs can be used in RL agents. We show SoTA performance on DMLab with gated transformers and a few small changes. Led by Emilio as an internship project! @DeepMindAI
0 replies, 86 likes
Russ Salakhutdinov: Using large Transformer-XLs for stable training of RL agents. Nice work from Emilio Parisotto’s internship and colleagues at DeepMind.
0 replies, 77 likes
Xander Steenbrugge: Transformers now also taking over SOTA from LSTMs in the Reinforcement Learning domain! 🤯😁
Very curious to see this applied to long time-horizon environments like StarCraft, Dota II and more.
@SchmidhuberAI is not going to like this 🤣
3 replies, 41 likes
Daisuke Okanohara: For memory-augmented RL, the agent with gated transformer-XL (GTrXL) achieves better performance than one with LSTM. Introduced gate (GRU) significantly stabilizes the training by enabling a Markov regime of training at the beginning. https://arxiv.org/abs/1910.06764
0 replies, 13 likes
Aran Komatsuzaki: Stabilizing Transformers for Reinforcement Learning
https://arxiv.org/abs/1910.06764 Proposed the Gated Transformer-XL, which surpasses LSTMs and achieves sota results on the multi-task DMLab-30 benchmark suite.
0 replies, 9 likes
Caglar Gulcehre: We have shown that it is possible to train very large transformers on RL problems by introducing gating mechanism to stabilize them. It turned out this architecture is very effective on a number of RL benchmarks. As @maxjaderberg pointed out,mostly due to the work done by Emilio.
0 replies, 8 likes
Colin Raffel: @mariusmosbach This finding was unpublished but included as default in tensor2tensor for a long time because it clearly works better. Recently https://arxiv.org/abs/2002.04745 and https://arxiv.org/abs/1910.06764 discussed it more explicitly.
0 replies, 5 likes
Phillip Wang: @SonAthenos @OpenAI @CShorten30 https://arxiv.org/abs/1910.06764
0 replies, 2 likes
arXiv in review: #ICLR2020 Stabilizing Transformers for Reinforcement Learning. (arXiv:1910.06764v1 [cs\.LG]) http://arxiv.org/abs/1910.06764
0 replies, 1 likes
Benjamin Singleton: Stabilizing Transformers for Reinforcement Learning #BigData #DataScience https://arxiv.org/abs/1910.06764
0 replies, 1 likes
Found on Oct 16 2019 at https://arxiv.org/pdf/1910.06764.pdf