Michael (Misha) Laskin: New paper led by @astooke w/ @kimin_le2 & @pabbeel - Decoupling Representation Learning from RL. First time RL trained on unsupervised features matches (or beats) end-to-end RL!
6 replies, 426 likes
hardmaru: I've received so much criticism for not incorporating reward info into world model / representations used by RL agents
But the way I see it, rewards are so overvalued…
See this new paper, “Decoupling Representation Learning from Reinforcement Learning”
14 replies, 325 likes
Shimon Whiteson: I agree with the claim that reward is overvalued but probably not for the reason intended here.
Reward is overvalued because it is often misspecified: we have a bad habit of assuming we know how to define reward, when in fact this is often a tricky value alignment problem.
7 replies, 149 likes
Max Jaderberg: Do we need end-to-end deep RL? This work from @stookemon et al shows that you can train a feature extractor with a purely unsupervised loss, with RL learnt on top, and match end-to-end RL even in 3D envs 👏👏
0 replies, 72 likes
Aravind Srinivas: David's LeCake argument in his World Models talk was the inspiration for me to design a LeCake like experiment in the CURL paper (where encoder is detached from RL gradients). But we missed a trick: doing data augmentation on the latent when performing the RL on top. https://t.co/UvmYlQsbR2
1 replies, 37 likes
Daisuke Okanohara: They propose ATC; unsupervised training of image encoder for RL s.t contrastive training where images separated by a short time is positive pairs w/ augmentations, and train RL policy with a frozen encoder. outperform end-to-end RL. https://arxiv.org/abs/2009.08319
0 replies, 11 likes
AK: Decoupling Representation Learning from Reinforcement Learning
abs: https://arxiv.org/abs/2009.08319 https://t.co/62xxCQFkKe
0 replies, 9 likes
B C: Today`s read -
Decoupling Representation Learning from Reinforcement Learning
0 replies, 1 likes
Found on Sep 18 2020 at https://arxiv.org/pdf/2009.08319.pdf