Papers of the day   All papers

Randomized Automatic Differentiation


Ryan Adams: Automatic differentiation collapses the linearized computational graph to compute a Jacobian. We don’t need exact gradients for SGD, so let’s use cheap Monte Carlo estimators instead: @denizzokt @AlexBeatson @NMcgreivy @jaduol1

2 replies, 478 likes

Deniz Oktay: Why spend computation and memory on exact gradients only to use them for stochastic optimization? Introducing: Randomized Automatic Differentiation (RAD) w/ @NMcgreivy @jaduol1 @AlexBeatson @ryan_p_adams

3 replies, 443 likes

Sam Greydanus: At the price of adding noise to gradients, we can save lots of memory. This makes backprop ~1 order of magnitude more memory-efficient (depends on model), without hurting optimization much. I'd like to see this on larger problems.

1 replies, 49 likes

Alex Beatson: Super excited to share this new paper on Randomized Automatic Differentiation! Minibatch SGD estimates grads by sampling *data* nodes in a computational graph. What if we sampled *all* nodes when doing AD? We show this can reduce memory costs in ML and scientific computing.

0 replies, 32 likes

Sam Power: quite cool stuff. a bit reminiscent of some old techniques for solving linear systems with sampling methods, e.g. (n.b. at the time / to this author, `sequential monte carlo' did not carry the same meaning which it currently does)

0 replies, 4 likes


Found on Jul 24 2020 at

PDF content of a computer science paper: Randomized Automatic Differentiation