Papers of the day   All papers

MOPO: Model-based Offline Policy Optimization

Comments

Chelsea Finn: Offline RL may make it possible to learn behavior from large, diverse datasets (like the rest of ML). We introduce: MOPO: Model-based Offline Policy Optimization https://arxiv.org/abs/2005.13239 w/ Tianhe Yu, Garrett Thomas, Lantao Yu @StefanoErmon @james_y_zou @svlevine @tengyuma

5 replies, 481 likes


Sergey Levine: I recorded an extended version of my offline RL talk, as practice for a live presentation earlier this week: https://www.youtube.com/watch?v=qgZPZREor5I Covers the following: AWAC: https://arxiv.org/abs/2006.09359 MOPO: https://arxiv.org/abs/2005.13239 CQL: https://arxiv.org/abs/2006.04779 D4RL: https://arxiv.org/abs/2004.07219

4 replies, 294 likes


Sergey Levine: Turns out that model-based RL algorithms make very good offline RL methods. If we then take a bit of care to account for model errors, we can attain state-of-the-art results on offline RL benchmark tasks, including several D4RL tasks!

3 replies, 177 likes


Tengyu Ma: The main principle is conservatism in the face of uncertainty --- we penalize the reward by the uncertainty of the learned dynamics. We also had a new version with more ablation studies! (same link as below)

0 replies, 46 likes


Shane Gu 顾世翔: Our building block, BREMEN, is a simple model-based offline RL method that works even with 10-20x less data (model-free fails). Also check out 2 concurrent model+offline works https://arxiv.org/abs/2005.13239 @svlevine @chelseabfinn @tengyuma and https://arxiv.org/abs/2005.05951 @aravindr93 3/

1 replies, 14 likes


Isaac Kargar: an extended version of @svlevine's offline RL talk: https://youtube.com/watch?v=qgZPZREor5I Covers the following: AWAC: https://arxiv.org/abs/2006.09359 MOPO: https://arxiv.org/abs/2005.13239 CQL: https://arxiv.org/abs/2006.04779 D4RL: https://arxiv.org/abs/2004.07219

0 replies, 1 likes


Content

Found on May 28 2020 at https://arxiv.org/pdf/2005.13239.pdf

PDF content of a computer science paper: MOPO: Model-based Offline Policy Optimization