Roger Grosse: In deep learning research, the sky turns out to be blue, but only if you measure it very carefully. Interesting meta-scientific paper on evaluating neural net optimizers, by Choi et al.
2 replies, 203 likes
Sebastian Raschka: "On Empirical Comparisons of Optimizers for Deep Learning" => "As tuning effort grows without bound, more general optimizers should never underperform the ones they can approximate" https://arxiv.org/abs/1910.05446 https://t.co/hUIGMyshkC
2 replies, 183 likes
Dmytro Mishkin: Tl;dr: Adam >> SGD, if you tune eps, momentum and lr schedule for it
0 replies, 14 likes
Daisuke Okanohara: In NN optimization, ignored metaparameters are actually important. Especially, "eps" is often used with default 1e-8, but the optimal value can be 1~10^4. With full metaparameter search, ADAM and NADAM outperform SGD and momentum. https://arxiv.org/abs/1910.05446
0 replies, 13 likes
arxiv: On Empirical Comparisons of Optimizers for Deep Learning. http://arxiv.org/abs/1910.05446 https://t.co/QVxeQCpAOL
0 replies, 10 likes
Delip Rao: 2. "Did you optimize your hyperparameters?"
With compute costs coming down, it is becoming more affordable to run hyperparameter optimization, as long as you stay away from the Sesame Street. It would be interesting to condition this based on where the authors are coming from. https://t.co/9g9PgepxFO
1 replies, 7 likes
Found on Oct 15 2019 at https://arxiv.org/pdf/1910.05446.pdf