Papers of the day   All papers

Reconciling modern machine learning and the bias-variance trade-off


Dec 05 2019 OpenAI

A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time.
80 replies, 2035 likes

Aug 23 2019 Ian Osband

Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: The "bias-variance" you knew was just the first piece of the story!
16 replies, 1948 likes

Aug 24 2019 Nando de Freitas

I agree. This was a phenomenal paper. I’m hoping it will inspire researchers to probe further.
1 replies, 478 likes

Aug 27 2019 Oriol Vinyals

The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!
1 replies, 168 likes

Dec 05 2019 Greg Yang

@OpenAI Isn't this the "double descent" phenomenon studied in and subsequent works?
3 replies, 131 likes

Nov 11 2019 Gilles Louppe

Do you know of anyone who reproduced the double-U generalization curve of over-parameterized networks? Looking for a friend :-)
8 replies, 127 likes

Dec 13 2019 Olivia Guest | Ολίβια Γκεστ

I was in a workshop that warned against overfitting without mentioning that it's just not the case in practice that many deep network models are overfit, so I'm mentioning it here: Preprint: Talk:
1 replies, 31 likes

Jan 01 2020 halvarflake

@zacharylipton Not sure there's a "single" paper to note, but the entire discussion about double-descent has been the most interesting thing I read this year: 1) - "Reconciling modern machine learning practice and the bias-variance trade-off"
2 replies, 24 likes

Sep 12 2019 halvarflake

To my great surprise, I found a few minutes of downtime today to read If you are into ML or statistics, I greatly recommend the paper; I will read the follow-ups but the empirical results showing double-descent risk curves are really fascinating.
1 replies, 23 likes

Oct 06 2019 Daisuke Okanohara

The bias-variance tradeoff shows that a model with appropriate complexity can generalize. Recent "double descent" indicates that a larger (than the necessary) model can generalize better in some situations.
1 replies, 16 likes

Feb 06 2020 Vince Buffalo

I quite like this figure (from this paper:, which I think unites why both machine learning and parameter-rich Bayesian models are doing well across a variety of tasks (Murphy also makes this point in Chapter 17 of his book).
1 replies, 15 likes

Jan 08 2020 Narges Razavian

Adding the double descent paper to lecture 1 of my introductory ML course.. Who would've thought? Reconciling modern machine learning practice and the bias-variance trade-off Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal (&
0 replies, 14 likes

Sep 03 2019 halvarflake

Stats/ML followers: This paper argues that the risk curve when overparametrizing models is "w"-shaped vs u-shaped for many models. They provide some evidence from DNN and RFs. A fascinating claim; will need to mull the paper a bit. Worth a read.
1 replies, 12 likes

Sep 05 2019 Jigar Doshi

From Classical Statistics to Modern Machine Learning. This attempts to explain why we don't overfit when we train for a very long time. Beautiful talk as well Paper: Talk:
0 replies, 10 likes

Sep 02 2019 François Fleuret

So is the idea in Belkin's paper simply that when the training error is zero and you increase your model space, you can reduce even more *whatever measure of capacity you defined initially*?
2 replies, 8 likes

Sep 26 2019 Kameron Decker Harris

Check out this paper: "Reconciling modern machine-learning practice and the classical bias–variance trade-off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
1 replies, 7 likes

Aug 23 2019 𝚄𝚕𝚞𝚐𝚋𝚎𝚔 𝚂. 𝙺𝚊𝚖𝚒𝚕𝚘𝚟

This was the paper you mentioned to me at BASP @mariotelfig?
2 replies, 7 likes

Dec 06 2019 Andreas Mueller

@jeremyphoward @reachtarunhere @OpenAI a theoretical explanation is given by y colleague Daniel Hsu here:
1 replies, 6 likes

Dec 05 2019 Nil Adell Mill

It's great to see more work on the double descent phenomenon. It comes as a good reminder for me to re-visit Belkin et al. (
0 replies, 6 likes

Aug 24 2019 Orestis Tsinalis

Very interesting paper with empirical observations of "double descent"/two-regime behaviour in test performance of complex ML models as a function of (L2 norm-based) model complexity. "Reconciling Modern Machine Learning and the Bias-Variance Tradeoff"
0 replies, 4 likes

Aug 27 2019 Karandeep Singh

The “double-descent” observed in this paper doesn’t make any sense to me intuitively. As model complexity increases (⬇️EPV), out-of-sample performance worsens then improves for neural nets and RFs? Why?
0 replies, 3 likes

Aug 14 2019 Joshua Loftus

Question about #MachineLearning #DeepLearning #AI What's the "surprise" or thing that needs to be "reconciled" about the "double descent" or "double U shape" test error curves? (1/2)
1 replies, 2 likes

Aug 24 2019

Reconciling modern machine learning and the bias-variance trade-off "...boosting with decision trees and Random Forests also show similar generalization behavior as neural nets, both before and after the interpolation threshold" #ArtificialIntelligence
0 replies, 2 likes

Oct 06 2019 SHIMOMURA Takuji We first consider a popular class of non-linear parametric models called Random Fourier Features (RFF) [30], which can be viewed as a class of two-layer neural networks with fixed weights in the first layer. #nextAI
0 replies, 1 likes

Dec 05 2019 Wojciech Czarnecki

@ilyasut By unnoticed you mean published for just a year?
0 replies, 1 likes

Dec 08 2019 Luigi Freda

A new surprising perspective:  a "double descent" curve that subsumes the U-shaped bias-variance trade-off curve and shows how increasing model capacity beyond the point of interpolation results in improved performance.
1 replies, 1 likes

Aug 24 2019 Dave Harris

Wow, this is a weird approach that would never be useful for training real models, but it’s perfect for gaining insight about what exactly is happening with over-parameterized models that don’t overfit. I’m really impressed.
0 replies, 1 likes