Papers of the day   All papers

Randomized Automatic Differentiation

Comments

Ryan Adams: Automatic differentiation collapses the linearized computational graph to compute a Jacobian. We don’t need exact gradients for SGD, so let’s use cheap Monte Carlo estimators instead: https://arxiv.org/abs/2007.10412 https://github.com/PrincetonLIPS/RandomizedAutomaticDifferentiation @denizzokt @AlexBeatson @NMcgreivy @jaduol1 https://t.co/JArPO7btng

2 replies, 478 likes


Deniz Oktay: Why spend computation and memory on exact gradients only to use them for stochastic optimization? Introducing: Randomized Automatic Differentiation (RAD) https://arxiv.org/abs/2007.10412 w/ @NMcgreivy @jaduol1 @AlexBeatson @ryan_p_adams

3 replies, 443 likes


Sam Greydanus: At the price of adding noise to gradients, we can save lots of memory. This makes backprop ~1 order of magnitude more memory-efficient (depends on model), without hurting optimization much. I'd like to see this on larger problems.

1 replies, 49 likes


Alex Beatson: Super excited to share this new paper on Randomized Automatic Differentiation! Minibatch SGD estimates grads by sampling *data* nodes in a computational graph. What if we sampled *all* nodes when doing AD? We show this can reduce memory costs in ML and scientific computing.

0 replies, 32 likes


Sam Power: quite cool stuff. a bit reminiscent of some old techniques for solving linear systems with sampling methods, e.g. https://link.springer.com/article/10.1007/BF01578388 (n.b. at the time / to this author, `sequential monte carlo' did not carry the same meaning which it currently does)

0 replies, 4 likes


Content

Found on Jul 24 2020 at https://arxiv.org/pdf/2007.10412.pdf

PDF content of a computer science paper: Randomized Automatic Differentiation