Papers of the day   All papers

Underspecification Presents Challenges for Credibility in Modern Machine Learning


Alexander D'Amour: NEW from a big collaboration at Google: Underspecification Presents Challenges for Credibility in Modern Machine Learning Explores a common failure mode when applying ML to real-world problems. 🧵 1/14

17 replies, 1035 likes

Gary Marcus: must-read new study from @Google confirms all central claims of Deep Learning: A Critical Appraisal (2018): - machine learning often generalizes poorly - extrapolation beyond training data is key - urgent need for better ways of adding in domain expertise

16 replies, 797 likes

Aran Komatsuzaki: Underspecification Presents Challenges for Credibility in Modern Machine Learning Massive collaboration by Googlers to show underspecified ML pipeline can lead to various instability and poor model behavior, incl. shortcuts and spurious correlations.

2 replies, 249 likes

Thomas G. Dietterich: A review of “Underspecification Presents Challenges for Credibility in Modern Machine Learning” by D’Amour et al. 0/

1 replies, 201 likes

Steven Pinker: Gary @garymarcus has been writing for years that many AI claims are fragile and overhyped. A new study confirms his warning.

0 replies, 189 likes

Cory Doctorow #BLM: "Underspecification Presents Challenges for Credibility in Modern Machine Learning" is a new ML paper co-authored by 33 (!) Google researchers. It's been called a "wrecking ball" for our understanding of problems in machine learning. 1/

4 replies, 115 likes

Sergey Feldman: I'm glad this phenomenon has a name! The last time I observed this was while working on the Semantic Scholar search engine (feature-based LightGBM LambdaMART model). It occurred in two different ways. 1/n

2 replies, 86 likes

Sylvain Chabé-Ferret: ML discovers the identification problem.

2 replies, 83 likes

Brandon Rohrer: A wrecking ball of a paper. [Trigger warning if you just published state of the art benchmark performance]: Your fancy new optimizer's success might be a total fluke, and it's possible it will only hurt performance in production.

3 replies, 79 likes

jörn jacobsen: Exciting 40 author (👀) @Google paper on the trouble with underspecification in ML. Providing further intriguing empirical support for many of the issues we raised in our shortcut learning paper ( and more. Highly recommended read:

0 replies, 64 likes

Frank Pasquale: The point on domain expertise is a big theme of my book, "New Laws of Robotics." So many economic forces now tend to demote the perspective of expert labor (while elevating the power of owners of capital & technology). Law must counteract and balance these forces.

1 replies, 58 likes

Ed H. Chi: New work coming out of my team at Google on credibility of ML models:

0 replies, 41 likes

Victor Veitch: Very excited about this (giant) paper on an underappreciated but ubiquitous way that machine learning fails in practice

0 replies, 40 likes

Andrea Montanari: Overparametrization in modern machine learning comes with its blessings and its challenges. Nice thread about the latter (and a recent paper) by @alexdamour .

0 replies, 39 likes

Parag Agrawal: Really great paper that describes a commonly seen problem with ML models in production.

1 replies, 36 likes

Carlos E. Perez: Excellent paper from Google discussing the robustness of Deep Learning models when deployed in real domains.

1 replies, 33 likes

Stephan Hoyer: We see this failure mode all the time in ML models for physics, e.g., if you train to predict a single time-step forward, but want the model to generalize when you make repeated predictions over many time-steps.

1 replies, 21 likes

No person of color has died in Antifa custody: Another brilliant thread by Cory Doctorow that you should read if you're interested in machine learning

0 replies, 21 likes

Jesse Dodge: Massive collaboration at Google about reproducibility, and how underspecification of the ML pipeline (for vision, NLP, etc.) leads to highly variable results. Our goal with the reproducibility checklists and challenge has been to reduce underspecification, love seeing work here

1 replies, 20 likes

Josh Tobin: Interesting and practically relevant paper: "predictors trained to the same level of iid generalization will often show widely divergent behavior when applied to real-world settings" even if the only difference between the predictors is something like random seed

0 replies, 17 likes

Steve Yadlowsky: Under specification is a big annoyance in theory of deep learning + other high dimensional ML methods. Here, we show that it’s also a problem for practical ML when deployment domain isn’t exactly like the training domain. Fun collab to be a part of, thx @alexdamour for leading!

0 replies, 17 likes

Dileep George: even google doesn't have enough data to feed current deep learning algos 🤔

0 replies, 17 likes

alex peysakhovich 🤖: TLDR: Lots of stuff depends on the random seed.

1 replies, 16 likes

Chomba Bupe: This should have been obvious to Google that curve fitting isn't intelligence. But better late than never.

2 replies, 16 likes

Sina Fazelpour: It's a great feeling when there's a new, thoughtful paper that is exactly relevant to, and enriches the literature about, an issue you're currently working on. Excellent work by @alexdamour and co:

1 replies, 15 likes

Ben Hamner: One major challenge in deploying AI models: training and deployment scenarios commonly have different data characteristics, and models that perform similarly well in training may be substantially different when deployed due to underspecification

0 replies, 14 likes

johnurbanik: Thanks to @alexdamour, @kat_heller, @Ghassen_ML, @graduatedescent et. al. for this exploration into how fickle ML is. Contrastive/stratified/shift-based tests can reveal a lot; the authors show varying RNGs leads to big differences in underspecified (i.e. most) models. (1/4)

1 replies, 11 likes

danilobzdok: New @Google paper: "Underspecification is common in modern ML pipelines, such as in #deep #learning...this problem appears in ...#medical #imaging, clinical risk prediction based on electronic health records, and #medical #genomics." h/t: @GaryMarcus

0 replies, 10 likes

Daisuke Okanohara: Underspecification in ML pipelines is a key reason for poor performance in deployment. 1) build a stress test that represents task requirement 2) select the best predictor from candidates better than marginalization 3) design task-specific regularization

0 replies, 8 likes

Julius Adebayo: This week's new must read 30 pager. Domain shift and spurious training signals are major open problems in ML.

0 replies, 8 likes

Maxime Sermesant: As we all know, we are not there yet. Adding domain expertise is crucial for AI in healthcare to really happen. Anatomical and functional modelling is a nice way to do it! 😀

0 replies, 7 likes

Swede White: Interesting thread. Always funny how methods from stats—like sensitivity analysis and now specification error— that have been around for a while aren’t consistently adapted and/or applied to ML when they could probably prevent or solve quite a few problems

0 replies, 7 likes

Sam Cassidy/Hoon Tae Kim 🖤🦢: Oops, I was right again.

2 replies, 7 likes

Nicole Maffeo: ML models often ace training + testing, yet fail real world environments 💯🔥 . Design & stress-test models at application level. Or risk undermining ML credibility when it matters most— Tldr; Internships before employment (& model deployment). 👇 #AI

1 replies, 6 likes

Luis Lamb: Food for thought

0 replies, 6 likes

Lady Dr. (PhD, she/her): I clicked bc of my fascination with “underspecification” in phonology. I don’t think it has the same sense here, but it might? Anyone better at newer models of phonology want to jump in here?

0 replies, 4 likes

Prof. Tom Crick: “Sensitive dependence upon initial conditions”

1 replies, 4 likes

60_Harvests: Fun thread by @doctorow: But, why does the phrase "sensitive dependence upon initial conditions" haunt me whenever I read this stuff?

0 replies, 4 likes

Adarsh Subbaswamy: On a more general note, there's a growing need for more "stress tests" for models (see, e.g., Google's recent underspecification paper cc @alexdamour @vivnat). We hope our procedure helps in addressing this gap. 11/

1 replies, 4 likes

Uri #masks4all Manor: very important food for thought - key lesson: stress test your models!!!

0 replies, 3 likes

Margaret Warren: Underspecification Presents Challenges for Credibility in Modern Machine Learning ---from 2020 by many @google people.

0 replies, 2 likes

Omid Mirzaei: This great work discusses "underspecification issue" in machine learning models:

0 replies, 2 likes

Bryan 🇭🇹 🏁: Interesting

0 replies, 2 likes

Richard Minerich: This new google paper is very important Most ML algos are learning a minimal set of features, what if instead they learned some maximal set, using or to collapse equivalent features?

0 replies, 2 likes

Bruno Rocha: Underspecified horses? @boblsturm @jvanbalen @nkundiushuti @CarlosVaquero

1 replies, 2 likes

Afshin Khadangi: This is a great and huge effort from a group of distinguished ML scientists. I found the paper simple but descriptive. The story aligns with the findings that we have shown in EM-stellar.

0 replies, 2 likes

Michael Bennett: I believe @yudapearl has previously addressed this. Underspecification is not new, but rather the result of failing to identify the underlying causal relationships, instead indiscriminantly using correlation for prediction. They've stuck a new label on an old problem.

0 replies, 2 likes

Jeremy Goecks: Finally had a chance to read the new Google paper on underspecification. My summary and thoughts follow 1/n

1 replies, 1 likes

Thiago Marzagão: ML models that perform similarly well with test data can perform radically different with real-world data.

0 replies, 1 likes

JagB—NanoSalad—Veg Gap Zapper: Many data scientists & tech titans still resist this ht @ChombaBupe "should have been obvious …curve fitting isn't intelligence"

2 replies, 1 likes

YBenkler: 3. Found to affect pretty much every field of ML big promise applications: computer vision, medical imaging, NLP (including "shortcuts that reinforce societal biases around protected attributes such as gender"), & clinical predictions from EHR.

0 replies, 1 likes

James Cham ✍🏻: This is terrific, although @bruces puts it much more succinctly.

0 replies, 1 likes

Alexander D'Amour: Summary of the work in our own words:

0 replies, 1 likes

Etienne David: We have exactly the same problem in Plant phenotyping when trying to deploy algorithms to fields !

0 replies, 1 likes

Ben Meghreblian: The way we train AI is fundamentally flawed: Full paper - 'Underspecification Presents Challenges for Credibility in Modern Machine Learning': #machinelearning #ml #deeplearning

0 replies, 1 likes

Galuh Sahid: OK, not sure how I missed this when it was first published, but I need to read this paper. "Underspecification Presents Challenges for Credibility in Modern Machine Learning":

0 replies, 1 likes

Beth Carey: Generalization humans do is based on associations - knowing that tuna is a fish, fish is an animal etc. The way machines can know this is similar. By symbolic #AI and #linguistics . It's not possible via Deep Learning @jbthinking

1 replies, 0 likes

Cory Doctorow #BLM: A machine learning wrecking ball: Even if you fix training data, you still have to reckon with underspecification. 2/

1 replies, 0 likes


Found on Nov 09 2020 at

PDF content of a computer science paper: Underspecification Presents Challenges for Credibility in Modern Machine Learning