Papers of the day   All papers

Reconciling modern machine learning and the bias-variance trade-off

Comments

OpenAI: A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time. https://openai.com/blog/deep-double-descent/ https://t.co/Zdox9dbIBv

80 replies, 2035 likes


Ian Osband: Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: https://arxiv.org/abs/1812.11118 The "bias-variance" you knew was just the first piece of the story! https://t.co/J24b0W8LDR

17 replies, 1939 likes


Nando de Freitas: I agree. This was a phenomenal paper. I’m hoping it will inspire researchers to probe further.

1 replies, 478 likes


Oriol Vinyals: The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!

1 replies, 168 likes


Greg Yang: @OpenAI Isn't this the "double descent" phenomenon studied in https://arxiv.org/abs/1812.11118 and subsequent works?

3 replies, 131 likes


Gilles Louppe: Do you know of anyone who reproduced the double-U generalization curve of over-parameterized networks? https://arxiv.org/pdf/1812.11118.pdf Looking for a friend :-) https://t.co/6GgKsyAiSm

8 replies, 127 likes


Olivia Guest | Ολίβια Γκεστ: I was in a workshop that warned against overfitting without mentioning that it's just not the case in practice that many deep network models are overfit, so I'm mentioning it here: Preprint: https://arxiv.org/abs/1812.11118 Talk: https://cbmm.mit.edu/video/fit-without-fear-over-fitting-perspective-modern-deep-and-shallow-learning https://t.co/vxWaDXlZ9N

1 replies, 34 likes


halvarflake: @zacharylipton Not sure there's a "single" paper to note, but the entire discussion about double-descent has been the most interesting thing I read this year: 1) https://arxiv.org/abs/1812.11118 - "Reconciling modern machine learning practice and the bias-variance trade-off"

2 replies, 24 likes


halvarflake: To my great surprise, I found a few minutes of downtime today to read https://arxiv.org/abs/1812.11118. If you are into ML or statistics, I greatly recommend the paper; I will read the follow-ups but the empirical results showing double-descent risk curves are really fascinating.

1 replies, 23 likes


Wojciech Czarnecki: @ilyasut By unnoticed you mean published for just a year?https://arxiv.org/abs/1812.11118

1 replies, 16 likes


Daisuke Okanohara: The bias-variance tradeoff shows that a model with appropriate complexity can generalize. Recent "double descent" indicates that a larger (than the necessary) model can generalize better in some situations. https://arxiv.org/abs/1812.11118 https://arxiv.org/abs/1903.07571 https://arxiv.org/abs/1909.11720

1 replies, 16 likes


Vince Buffalo: I quite like this figure (from this paper: https://arxiv.org/abs/1812.11118), which I think unites why both machine learning and parameter-rich Bayesian models are doing well across a variety of tasks (Murphy also makes this point in Chapter 17 of his book). https://t.co/Jkt4JYHBcG

1 replies, 15 likes


Narges Razavian: Adding the double descent paper to lecture 1 of my introductory ML course.. Who would've thought? Reconciling modern machine learning practice and the bias-variance trade-off Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118 (& https://openai.com/blog/deep-double-descent/) https://t.co/1TsrIatsBq

0 replies, 14 likes


halvarflake: Stats/ML followers: This paper https://arxiv.org/pdf/1812.11118.pdf argues that the risk curve when overparametrizing models is "w"-shaped vs u-shaped for many models. They provide some evidence from DNN and RFs. A fascinating claim; will need to mull the paper a bit. Worth a read.

1 replies, 12 likes


Jigar Doshi: From Classical Statistics to Modern Machine Learning. This attempts to explain why we don't overfit when we train for a very long time. Beautiful talk as well Paper: https://arxiv.org/abs/1812.11118 Talk: https://www.youtube.com/watch?v=OBCciGnOJVs https://t.co/J93wvzuNJd

0 replies, 10 likes


François Fleuret: So is the idea in Belkin's paper simply that when the training error is zero and you increase your model space, you can reduce even more *whatever measure of capacity you defined initially*? https://arxiv.org/abs/1812.11118

2 replies, 8 likes


Kameron Decker Harris: Check out this paper: "Reconciling modern machine-learning practice and the classical bias–variance trade-off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118 https://www.pnas.org/content/116/32/15849 https://t.co/zk36mYdMhm

1 replies, 7 likes


𝚄𝚕𝚞𝚐𝚋𝚎𝚔 𝚂. 𝙺𝚊𝚖𝚒𝚕𝚘𝚟: This was the paper you mentioned to me at BASP @mariotelfig?

2 replies, 7 likes


Andreas Mueller: @jeremyphoward @reachtarunhere @OpenAI a theoretical explanation is given by y colleague Daniel Hsu here: https://arxiv.org/abs/1812.11118

1 replies, 6 likes


Nil Adell Mill: It's great to see more work on the double descent phenomenon. It comes as a good reminder for me to re-visit Belkin et al. (https://arxiv.org/abs/1812.11118) https://t.co/sBCINqeKYc

0 replies, 6 likes


Orestis Tsinalis: Very interesting paper with empirical observations of "double descent"/two-regime behaviour in test performance of complex ML models as a function of (L2 norm-based) model complexity. "Reconciling Modern Machine Learning and the Bias-Variance Tradeoff" https://arxiv.org/abs/1812.11118 https://t.co/A3kM6UKNPH

0 replies, 4 likes


Karandeep Singh: The “double-descent” observed in this paper doesn’t make any sense to me intuitively. As model complexity increases (⬇️EPV), out-of-sample performance worsens then improves for neural nets and RFs? Why? https://t.co/0LH1vMcIFc

0 replies, 3 likes


Joshua Loftus: Question about #MachineLearning #DeepLearning #AI What's the "surprise" https://arxiv.org/abs/1903.08560 or thing that needs to be "reconciled" https://arxiv.org/abs/1812.11118 about the "double descent" or "double U shape" test error curves? (1/2)

1 replies, 2 likes


msb.ai: Reconciling modern machine learning and the bias-variance trade-off "...boosting with decision trees and Random Forests also show similar generalization behavior as neural nets, both before and after the interpolation threshold" https://arxiv.org/abs/1812.11118 #ArtificialIntelligence https://t.co/gnxW2IqLdm

0 replies, 2 likes


Luigi Freda: A new surprising perspective:  a "double descent" curve that subsumes the U-shaped bias-variance trade-off curve and shows how increasing model capacity beyond the point of interpolation results in improved performance.  https://arxiv.org/abs/1812.11118?fbclid=IwAR3yB_VeCOO5jZzDBBBjsjgnrHd1ttXvlgyD9DJhUtVe_9P5jSQy3UkmTVM https://t.co/Zyug809IOm

1 replies, 1 likes


SHIMOMURA Takuji: https://arxiv.org/pdf/1812.11118.pdf We first consider a popular class of non-linear parametric models called Random Fourier Features (RFF) [30], which can be viewed as a class of two-layer neural networks with fixed weights in the first layer. #nextAI https://t.co/RC6vxJi4pb

0 replies, 1 likes


Dave Harris: Wow, this is a weird approach that would never be useful for training real models, but it’s perfect for gaining insight about what exactly is happening with over-parameterized models that don’t overfit. I’m really impressed.

0 replies, 1 likes


Content

Found on Dec 05 2019 at https://arxiv.org/pdf/1812.11118.pdf

PDF content of a computer science paper: Reconciling modern machine learning and the bias-variance trade-off