Papers of the day   All papers

Reconciling modern machine learning and the bias-variance trade-off

Comments

Dec 05 2019 OpenAI

A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time. https://openai.com/blog/deep-double-descent/ https://t.co/Zdox9dbIBv
80 replies, 2035 likes


Aug 23 2019 Ian Osband

Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: https://arxiv.org/abs/1812.11118 The "bias-variance" you knew was just the first piece of the story! https://t.co/J24b0W8LDR
16 replies, 1948 likes


Aug 24 2019 Nando de Freitas

I agree. This was a phenomenal paper. I’m hoping it will inspire researchers to probe further.
1 replies, 478 likes


Aug 27 2019 Oriol Vinyals

The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!
1 replies, 168 likes


Dec 05 2019 Greg Yang

@OpenAI Isn't this the "double descent" phenomenon studied in https://arxiv.org/abs/1812.11118 and subsequent works?
3 replies, 131 likes


Nov 11 2019 Gilles Louppe

Do you know of anyone who reproduced the double-U generalization curve of over-parameterized networks? https://arxiv.org/pdf/1812.11118.pdf Looking for a friend :-) https://t.co/6GgKsyAiSm
8 replies, 127 likes


Dec 13 2019 Olivia Guest | Ολίβια Γκεστ

I was in a workshop that warned against overfitting without mentioning that it's just not the case in practice that many deep network models are overfit, so I'm mentioning it here: Preprint: https://arxiv.org/abs/1812.11118 Talk: https://cbmm.mit.edu/video/fit-without-fear-over-fitting-perspective-modern-deep-and-shallow-learning https://t.co/vxWaDXlZ9N
1 replies, 31 likes


Jan 01 2020 halvarflake

@zacharylipton Not sure there's a "single" paper to note, but the entire discussion about double-descent has been the most interesting thing I read this year: 1) https://arxiv.org/abs/1812.11118 - "Reconciling modern machine learning practice and the bias-variance trade-off"
2 replies, 24 likes


Sep 12 2019 halvarflake

To my great surprise, I found a few minutes of downtime today to read https://arxiv.org/abs/1812.11118. If you are into ML or statistics, I greatly recommend the paper; I will read the follow-ups but the empirical results showing double-descent risk curves are really fascinating.
1 replies, 23 likes


Oct 06 2019 Daisuke Okanohara

The bias-variance tradeoff shows that a model with appropriate complexity can generalize. Recent "double descent" indicates that a larger (than the necessary) model can generalize better in some situations. https://arxiv.org/abs/1812.11118 https://arxiv.org/abs/1903.07571 https://arxiv.org/abs/1909.11720
1 replies, 16 likes


Feb 06 2020 Vince Buffalo

I quite like this figure (from this paper: https://arxiv.org/abs/1812.11118), which I think unites why both machine learning and parameter-rich Bayesian models are doing well across a variety of tasks (Murphy also makes this point in Chapter 17 of his book). https://t.co/Jkt4JYHBcG
1 replies, 15 likes


Jan 08 2020 Narges Razavian

Adding the double descent paper to lecture 1 of my introductory ML course.. Who would've thought? Reconciling modern machine learning practice and the bias-variance trade-off Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118 (& https://openai.com/blog/deep-double-descent/) https://t.co/1TsrIatsBq
0 replies, 14 likes


Sep 03 2019 halvarflake

Stats/ML followers: This paper https://arxiv.org/pdf/1812.11118.pdf argues that the risk curve when overparametrizing models is "w"-shaped vs u-shaped for many models. They provide some evidence from DNN and RFs. A fascinating claim; will need to mull the paper a bit. Worth a read.
1 replies, 12 likes


Sep 05 2019 Jigar Doshi

From Classical Statistics to Modern Machine Learning. This attempts to explain why we don't overfit when we train for a very long time. Beautiful talk as well Paper: https://arxiv.org/abs/1812.11118 Talk: https://www.youtube.com/watch?v=OBCciGnOJVs https://t.co/J93wvzuNJd
0 replies, 10 likes


Sep 02 2019 François Fleuret

So is the idea in Belkin's paper simply that when the training error is zero and you increase your model space, you can reduce even more *whatever measure of capacity you defined initially*? https://arxiv.org/abs/1812.11118
2 replies, 8 likes


Sep 26 2019 Kameron Decker Harris

Check out this paper: "Reconciling modern machine-learning practice and the classical bias–variance trade-off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118 https://www.pnas.org/content/116/32/15849 https://t.co/zk36mYdMhm
1 replies, 7 likes


Aug 23 2019 𝚄𝚕𝚞𝚐𝚋𝚎𝚔 𝚂. 𝙺𝚊𝚖𝚒𝚕𝚘𝚟

This was the paper you mentioned to me at BASP @mariotelfig?
2 replies, 7 likes


Dec 06 2019 Andreas Mueller

@jeremyphoward @reachtarunhere @OpenAI a theoretical explanation is given by y colleague Daniel Hsu here: https://arxiv.org/abs/1812.11118
1 replies, 6 likes


Dec 05 2019 Nil Adell Mill

It's great to see more work on the double descent phenomenon. It comes as a good reminder for me to re-visit Belkin et al. (https://arxiv.org/abs/1812.11118) https://t.co/sBCINqeKYc
0 replies, 6 likes


Aug 24 2019 Orestis Tsinalis

Very interesting paper with empirical observations of "double descent"/two-regime behaviour in test performance of complex ML models as a function of (L2 norm-based) model complexity. "Reconciling Modern Machine Learning and the Bias-Variance Tradeoff" https://arxiv.org/abs/1812.11118 https://t.co/A3kM6UKNPH
0 replies, 4 likes


Aug 27 2019 Karandeep Singh

The “double-descent” observed in this paper doesn’t make any sense to me intuitively. As model complexity increases (⬇️EPV), out-of-sample performance worsens then improves for neural nets and RFs? Why? https://t.co/0LH1vMcIFc
0 replies, 3 likes


Aug 14 2019 Joshua Loftus

Question about #MachineLearning #DeepLearning #AI What's the "surprise" https://arxiv.org/abs/1903.08560 or thing that needs to be "reconciled" https://arxiv.org/abs/1812.11118 about the "double descent" or "double U shape" test error curves? (1/2)
1 replies, 2 likes


Aug 24 2019 msb.ai

Reconciling modern machine learning and the bias-variance trade-off "...boosting with decision trees and Random Forests also show similar generalization behavior as neural nets, both before and after the interpolation threshold" https://arxiv.org/abs/1812.11118 #ArtificialIntelligence https://t.co/gnxW2IqLdm
0 replies, 2 likes


Oct 06 2019 SHIMOMURA Takuji

https://arxiv.org/pdf/1812.11118.pdf We first consider a popular class of non-linear parametric models called Random Fourier Features (RFF) [30], which can be viewed as a class of two-layer neural networks with fixed weights in the first layer. #nextAI https://t.co/RC6vxJi4pb
0 replies, 1 likes


Dec 05 2019 Wojciech Czarnecki

@ilyasut By unnoticed you mean published for just a year?https://arxiv.org/abs/1812.11118
0 replies, 1 likes


Dec 08 2019 Luigi Freda

A new surprising perspective:  a "double descent" curve that subsumes the U-shaped bias-variance trade-off curve and shows how increasing model capacity beyond the point of interpolation results in improved performance.  https://arxiv.org/abs/1812.11118?fbclid=IwAR3yB_VeCOO5jZzDBBBjsjgnrHd1ttXvlgyD9DJhUtVe_9P5jSQy3UkmTVM https://t.co/Zyug809IOm
1 replies, 1 likes


Aug 24 2019 Dave Harris

Wow, this is a weird approach that would never be useful for training real models, but it’s perfect for gaining insight about what exactly is happening with over-parameterized models that don’t overfit. I’m really impressed.
0 replies, 1 likes


Content