Papers of the day   All papers

Reconciling modern machine learning and the bias-variance trade-off

Comments

Aug 23 2019 Ian Osband

Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: https://arxiv.org/abs/1812.11118 The "bias-variance" you knew was just the first piece of the story! https://t.co/J24b0W8LDR
16 replies, 1945 likes


Aug 24 2019 Nando de Freitas

I agree. This was a phenomenal paper. I’m hoping it will inspire researchers to probe further.
1 replies, 478 likes


Aug 27 2019 Oriol Vinyals

The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!
1 replies, 168 likes


Nov 11 2019 Gilles Louppe

Do you know of anyone who reproduced the double-U generalization curve of over-parameterized networks? https://arxiv.org/pdf/1812.11118.pdf Looking for a friend :-) https://t.co/6GgKsyAiSm
8 replies, 127 likes


Sep 12 2019 halvarflake

To my great surprise, I found a few minutes of downtime today to read https://arxiv.org/abs/1812.11118. If you are into ML or statistics, I greatly recommend the paper; I will read the follow-ups but the empirical results showing double-descent risk curves are really fascinating.
1 replies, 23 likes


Oct 06 2019 Daisuke Okanohara

The bias-variance tradeoff shows that a model with appropriate complexity can generalize. Recent "double descent" indicates that a larger (than the necessary) model can generalize better in some situations. https://arxiv.org/abs/1812.11118 https://arxiv.org/abs/1903.07571 https://arxiv.org/abs/1909.11720
1 replies, 16 likes


Sep 03 2019 halvarflake

Stats/ML followers: This paper https://arxiv.org/pdf/1812.11118.pdf argues that the risk curve when overparametrizing models is "w"-shaped vs u-shaped for many models. They provide some evidence from DNN and RFs. A fascinating claim; will need to mull the paper a bit. Worth a read.
1 replies, 12 likes


Sep 05 2019 Jigar Doshi

From Classical Statistics to Modern Machine Learning. This attempts to explain why we don't overfit when we train for a very long time. Beautiful talk as well Paper: https://arxiv.org/abs/1812.11118 Talk: https://www.youtube.com/watch?v=OBCciGnOJVs https://t.co/J93wvzuNJd
0 replies, 10 likes


Sep 02 2019 François Fleuret

So is the idea in Belkin's paper simply that when the training error is zero and you increase your model space, you can reduce even more *whatever measure of capacity you defined initially*? https://arxiv.org/abs/1812.11118
2 replies, 8 likes


Sep 26 2019 Kameron Decker Harris

Check out this paper: "Reconciling modern machine-learning practice and the classical bias–variance trade-off" by Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal https://arxiv.org/abs/1812.11118 https://www.pnas.org/content/116/32/15849 https://t.co/zk36mYdMhm
1 replies, 7 likes


Aug 23 2019 𝚄𝚕𝚞𝚐𝚋𝚎𝚔 𝚂. 𝙺𝚊𝚖𝚒𝚕𝚘𝚟

This was the paper you mentioned to me at BASP @mariotelfig?
2 replies, 7 likes


Aug 24 2019 Orestis Tsinalis

Very interesting paper with empirical observations of "double descent"/two-regime behaviour in test performance of complex ML models as a function of (L2 norm-based) model complexity. "Reconciling Modern Machine Learning and the Bias-Variance Tradeoff" https://arxiv.org/abs/1812.11118 https://t.co/A3kM6UKNPH
0 replies, 4 likes


Aug 27 2019 Karandeep Singh

The “double-descent” observed in this paper doesn’t make any sense to me intuitively. As model complexity increases (⬇️EPV), out-of-sample performance worsens then improves for neural nets and RFs? Why? https://t.co/0LH1vMcIFc
0 replies, 3 likes


Aug 14 2019 Joshua Loftus

Question about #MachineLearning #DeepLearning #AI What's the "surprise" https://arxiv.org/abs/1903.08560 or thing that needs to be "reconciled" https://arxiv.org/abs/1812.11118 about the "double descent" or "double U shape" test error curves? (1/2)
1 replies, 2 likes


Aug 24 2019 msb.ai

Reconciling modern machine learning and the bias-variance trade-off "...boosting with decision trees and Random Forests also show similar generalization behavior as neural nets, both before and after the interpolation threshold" https://arxiv.org/abs/1812.11118 #ArtificialIntelligence https://t.co/gnxW2IqLdm
0 replies, 2 likes


Oct 06 2019 SHIMOMURA Takuji

https://arxiv.org/pdf/1812.11118.pdf We first consider a popular class of non-linear parametric models called Random Fourier Features (RFF) [30], which can be viewed as a class of two-layer neural networks with fixed weights in the first layer. #nextAI https://t.co/RC6vxJi4pb
0 replies, 1 likes


Aug 24 2019 Dave Harris

Wow, this is a weird approach that would never be useful for training real models, but it’s perfect for gaining insight about what exactly is happening with over-parameterized models that don’t overfit. I’m really impressed.
0 replies, 1 likes


Content