Jascha: Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible: https://arxiv.org/abs/2008.07545 We examine what information is usable for training neural networks, and how second order methods destroy exactly that information. https://t.co/j1Sc09YuKT
9 replies, 581 likes
Christian Szegedy: Very interesting insights. I always found it interesting why second order methods underperformed in practice for DNNs. This research might give insight what really helps in practice and what is unlikely to help.
2 replies, 70 likes
Danilo J. Rezende: Very interesting results and analysis!
0 replies, 45 likes
Daisuke Okanohara: Whitening and second-order optimization harm generalization because they destroy all information that can be used for prediction. We can show (for NN optimized with GD) test prediction depends on the training data only through the second-moment matrix. https://arxiv.org/abs/2008.07545
0 replies, 5 likes
Sid: absolutely fascinating! #MachineLearning #AI #Optimization
0 replies, 3 likes
Whitening eliminates per-category data correlations and makes it impossible to distinguish between noise and signal; second-order optimization can have the same effect as whitening, and both have a negative impact on generalization performance. https://t.co/FCKZtco9PJ
0 replies, 1 likes
Found on Aug 19 2020 at https://arxiv.org/pdf/2008.07545.pdf