Reward Tampering Problems and Solutions in Reinforcement Learning


Apr 13 2018 Janelle Shane

When machine learning is astonishing - I collected some highlights from a paper on algorithmic creativity
Aug 14 2019 DeepMind

In our latest AI safety blog post, we explore principled solutions to the reward tampering problem, in which a reinforcement learning agent actively changes its reward function to maximise reward. Blog post: Paper:
Sep 10 2019 hardmaru

Found this recent paper by Tom Everitt and Marcus Hutter that looks at the topic of RL agents “cheating” from an AI Safety perspective. Worth a look! Paper Blog
Aug 14 2019 Vishal Maini

another step towards developing a set of best practices for designing safe RL agents - in this case, by avoiding incentives for agents to tamper with their own reward function. great work, @tom4everitt and team 🚀 🤖 ✅
Aug 14 2019 Andrey Kurenkov 🤖

Aug 14 2019 Victoria Krakovna

Exciting work on the reward tampering problem in AI safety, where the agent changes its reward function by exploiting how reward is implemented in the environment. The paper proposes design principles for building agents without an incentive to tamper with the reward function.
Aug 14 2019 Kate Parkyn

