Papers of the day   All papers

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Comments

Alexander D'Amour: NEW from a big collaboration at Google: Underspecification Presents Challenges for Credibility in Modern Machine Learning Explores a common failure mode when applying ML to real-world problems. 🧵 1/14 https://arxiv.org/abs/2011.03395 https://t.co/AqtoNBGzd5

17 replies, 1035 likes


Gary Marcus: must-read new study from @Google confirms all central claims of Deep Learning: A Critical Appraisal (2018): - machine learning often generalizes poorly - extrapolation beyond training data is key - urgent need for better ways of adding in domain expertise https://arxiv.org/abs/2011.03395

16 replies, 797 likes


Aran Komatsuzaki: Underspecification Presents Challenges for Credibility in Modern Machine Learning Massive collaboration by Googlers to show underspecified ML pipeline can lead to various instability and poor model behavior, incl. shortcuts and spurious correlations. https://arxiv.org/abs/2011.03395 https://t.co/uNk1L9H7v3

2 replies, 249 likes


Thomas G. Dietterich: A review of “Underspecification Presents Challenges for Credibility in Modern Machine Learning” by D’Amour et al. https://arxiv.org/abs/2011.03395 0/

1 replies, 201 likes


Steven Pinker: Gary @garymarcus has been writing for years that many AI claims are fragile and overhyped. A new study confirms his warning.

0 replies, 189 likes


Cory Doctorow #BLM: "Underspecification Presents Challenges for Credibility in Modern Machine Learning" is a new ML paper co-authored by 33 (!) Google researchers. It's been called a "wrecking ball" for our understanding of problems in machine learning. https://arxiv.org/pdf/2011.03395.pdf 1/ https://t.co/7hYSmX2DGn

4 replies, 115 likes


Sergey Feldman: I'm glad this phenomenon has a name! The last time I observed this was while working on the Semantic Scholar search engine (feature-based LightGBM LambdaMART model). It occurred in two different ways. 1/n

2 replies, 86 likes


Sylvain Chabé-Ferret: ML discovers the identification problem.

2 replies, 83 likes


Brandon Rohrer: A wrecking ball of a paper. [Trigger warning if you just published state of the art benchmark performance]: Your fancy new optimizer's success might be a total fluke, and it's possible it will only hurt performance in production.

3 replies, 79 likes


jörn jacobsen: Exciting 40 author (👀) @Google paper on the trouble with underspecification in ML. Providing further intriguing empirical support for many of the issues we raised in our shortcut learning paper (https://arxiv.org/abs/2004.07780) and more. Highly recommended read: https://arxiv.org/abs/2011.03395 https://t.co/F81rTd9bGv

0 replies, 64 likes


Frank Pasquale: The point on domain expertise is a big theme of my book, "New Laws of Robotics." So many economic forces now tend to demote the perspective of expert labor (while elevating the power of owners of capital & technology). Law must counteract and balance these forces.

1 replies, 58 likes


Ed H. Chi: New work coming out of my team at Google on credibility of ML models:

0 replies, 41 likes


Victor Veitch: Very excited about this (giant) paper on an underappreciated but ubiquitous way that machine learning fails in practice

0 replies, 40 likes


Andrea Montanari: Overparametrization in modern machine learning comes with its blessings and its challenges. Nice thread about the latter (and a recent paper) by @alexdamour .

0 replies, 39 likes


Parag Agrawal: Really great paper that describes a commonly seen problem with ML models in production.

1 replies, 36 likes


Carlos E. Perez: Excellent paper from Google discussing the robustness of Deep Learning models when deployed in real domains. https://arxiv.org/abs/2011.03395

1 replies, 33 likes


Stephan Hoyer: We see this failure mode all the time in ML models for physics, e.g., if you train to predict a single time-step forward, but want the model to generalize when you make repeated predictions over many time-steps.

1 replies, 21 likes


No person of color has died in Antifa custody: Another brilliant thread by Cory Doctorow that you should read if you're interested in machine learning

0 replies, 21 likes


Jesse Dodge: Massive collaboration at Google about reproducibility, and how underspecification of the ML pipeline (for vision, NLP, etc.) leads to highly variable results. Our goal with the reproducibility checklists and challenge has been to reduce underspecification, love seeing work here

1 replies, 20 likes


Josh Tobin: Interesting and practically relevant paper: "predictors trained to the same level of iid generalization will often show widely divergent behavior when applied to real-world settings" even if the only difference between the predictors is something like random seed

0 replies, 17 likes


Steve Yadlowsky: Under specification is a big annoyance in theory of deep learning + other high dimensional ML methods. Here, we show that it’s also a problem for practical ML when deployment domain isn’t exactly like the training domain. Fun collab to be a part of, thx @alexdamour for leading!

0 replies, 17 likes


Dileep George: even google doesn't have enough data to feed current deep learning algos 🤔

0 replies, 17 likes


alex peysakhovich 🤖: TLDR: Lots of stuff depends on the random seed.

1 replies, 16 likes


Chomba Bupe: This should have been obvious to Google that curve fitting isn't intelligence. But better late than never.

2 replies, 16 likes


Sina Fazelpour: It's a great feeling when there's a new, thoughtful paper that is exactly relevant to, and enriches the literature about, an issue you're currently working on. Excellent work by @alexdamour and co: https://arxiv.org/abs/2011.03395

1 replies, 15 likes


Ben Hamner: One major challenge in deploying AI models: training and deployment scenarios commonly have different data characteristics, and models that perform similarly well in training may be substantially different when deployed due to underspecification http://arxiv.org/abs/2011.03395

0 replies, 14 likes


johnurbanik: Thanks to @alexdamour, @kat_heller, @Ghassen_ML, @graduatedescent et. al. for this exploration into how fickle ML is. Contrastive/stratified/shift-based tests can reveal a lot; the authors show varying RNGs leads to big differences in underspecified (i.e. most) models. (1/4)

1 replies, 11 likes


danilobzdok: New @Google paper: "Underspecification is common in modern ML pipelines, such as in #deep #learning...this problem appears in ...#medical #imaging, clinical risk prediction based on electronic health records, and #medical #genomics." https://arxiv.org/pdf/2011.03395.pdf h/t: @GaryMarcus https://t.co/tExlGOJnwq

0 replies, 10 likes


Daisuke Okanohara: Underspecification in ML pipelines is a key reason for poor performance in deployment. 1) build a stress test that represents task requirement 2) select the best predictor from candidates better than marginalization 3) design task-specific regularization https://arxiv.org/abs/2011.03395

0 replies, 8 likes


Julius Adebayo: This week's new must read 30 pager. Domain shift and spurious training signals are major open problems in ML.

0 replies, 8 likes


Maxime Sermesant: As we all know, we are not there yet. Adding domain expertise is crucial for AI in healthcare to really happen. Anatomical and functional modelling is a nice way to do it! 😀

0 replies, 7 likes


Swede White: Interesting thread. Always funny how methods from stats—like sensitivity analysis and now specification error— that have been around for a while aren’t consistently adapted and/or applied to ML when they could probably prevent or solve quite a few problems

0 replies, 7 likes


Sam Cassidy/Hoon Tae Kim 🖤🦢: Oops, I was right again.

2 replies, 7 likes


Nicole Maffeo: ML models often ace training + testing, yet fail real world environments 💯🔥 . Design & stress-test models at application level. Or risk undermining ML credibility when it matters most— Tldr; Internships before employment (& model deployment). 👇 https://arxiv.org/pdf/2011.03395.pdf #AI

1 replies, 6 likes


Luis Lamb: Food for thought

0 replies, 6 likes


Lady Dr. (PhD, she/her): I clicked bc of my fascination with “underspecification” in phonology. I don’t think it has the same sense here, but it might? Anyone better at newer models of phonology want to jump in here?

0 replies, 4 likes


Prof. Tom Crick: “Sensitive dependence upon initial conditions”

1 replies, 4 likes


60_Harvests: Fun thread by @doctorow: But, why does the phrase "sensitive dependence upon initial conditions" haunt me whenever I read this stuff?

0 replies, 4 likes


Adarsh Subbaswamy: On a more general note, there's a growing need for more "stress tests" for models (see, e.g., Google's recent underspecification paper https://arxiv.org/abs/2011.03395 cc @alexdamour @vivnat). We hope our procedure helps in addressing this gap. 11/

1 replies, 4 likes


Uri #masks4all Manor: very important food for thought - key lesson: stress test your models!!!

0 replies, 3 likes


Margaret Warren: Underspecification Presents Challenges for Credibility in Modern Machine Learning ---from 2020 by many @google people.

0 replies, 2 likes


Omid Mirzaei: This great work discusses "underspecification issue" in machine learning models: https://arxiv.org/pdf/2011.03395.pdf

0 replies, 2 likes


Bryan 🇭🇹 🏁: Interesting

0 replies, 2 likes


Richard Minerich: This new google paper is very important https://arxiv.org/abs/2011.03395 Most ML algos are learning a minimal set of features, what if instead they learned some maximal set, using or to collapse equivalent features?

0 replies, 2 likes


Bruno Rocha: Underspecified horses? @boblsturm @jvanbalen @nkundiushuti @CarlosVaquero

1 replies, 2 likes


Afshin Khadangi: This is a great and huge effort from a group of distinguished ML scientists. I found the paper simple but descriptive. The story aligns with the findings that we have shown in EM-stellar.

0 replies, 2 likes


Michael Bennett: I believe @yudapearl has previously addressed this. Underspecification is not new, but rather the result of failing to identify the underlying causal relationships, instead indiscriminantly using correlation for prediction. They've stuck a new label on an old problem.

0 replies, 2 likes


Jeremy Goecks: Finally had a chance to read the new Google paper on underspecification. My summary and thoughts follow 1/n https://arxiv.org/pdf/2011.03395.pdf

1 replies, 1 likes


Thiago Marzagão: ML models that perform similarly well with test data can perform radically different with real-world data. https://arxiv.org/pdf/2011.03395.pdf

0 replies, 1 likes


JagB—NanoSalad—Veg Gap Zapper: Many data scientists & tech titans still resist this ht @ChombaBupe "should have been obvious …curve fitting isn't intelligence"

2 replies, 1 likes


YBenkler: 3. Found to affect pretty much every field of ML big promise applications: computer vision, medical imaging, NLP (including "shortcuts that reinforce societal biases around protected attributes such as gender"), & clinical predictions from EHR. https://arxiv.org/pdf/2011.03395.pdf

0 replies, 1 likes


James Cham ✍🏻: This is terrific, although @bruces puts it much more succinctly.

0 replies, 1 likes


Alexander D'Amour: Summary of the work in our own words: https://twitter.com/alexdamour/status/1325921856738701312

0 replies, 1 likes


Etienne David: We have exactly the same problem in Plant phenotyping when trying to deploy algorithms to fields !

0 replies, 1 likes


Ben Meghreblian: The way we train AI is fundamentally flawed: https://www.technologyreview.com/2020/11/18/1012234/training-machine-learning-broken-real-world-heath-nlp-computer-vision/ Full paper - 'Underspecification Presents Challenges for Credibility in Modern Machine Learning': https://arxiv.org/abs/2011.03395 #machinelearning #ml #deeplearning

0 replies, 1 likes


Galuh Sahid: OK, not sure how I missed this when it was first published, but I need to read this paper. "Underspecification Presents Challenges for Credibility in Modern Machine Learning": https://arxiv.org/abs/2011.03395

0 replies, 1 likes


Beth Carey: Generalization humans do is based on associations - knowing that tuna is a fish, fish is an animal etc. The way machines can know this is similar. By symbolic #AI and #linguistics . It's not possible via Deep Learning @jbthinking

1 replies, 0 likes


Cory Doctorow #BLM: A machine learning wrecking ball: Even if you fix training data, you still have to reckon with underspecification. https://twitter.com/doctorow/status/1330186928654782468 2/ https://t.co/gSJoASlGQA

1 replies, 0 likes


Content

Found on Nov 09 2020 at https://arxiv.org/pdf/2011.03395.pdf

PDF content of a computer science paper: Underspecification Presents Challenges for Credibility in Modern Machine Learning