Papers of the day   All papers

Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging


Oct 14 2019 Ben Recht

Seems troubling: Many of the images in the CXR-14 chest x-ray data set are post treatment and have chest drains in the images. Once these examples are removed, machine learning performs worse than a first year resident. h/t @DrLukeOR
9 replies, 589 likes

Oct 14 2019 Luke Oakden-Rayner

"Improving Medical AI Safety by Addressing Hidden Stratification" is new research by @jdunnmon (@Stanford) and myself (@TheAIML). We argue HS can explain failures of pre-clinical #medical #AI testing, and propose an achievable regulatory solution. Blog:
2 replies, 75 likes

Jan 23 2020 Luke Oakden-Rayner

Subset testing of a commercially available melanoma detection #AI shows that performance in visually distinct, unusual lesions (ie mucosa or nail beds) drops precipitously. I'll always advocate this info should be available prior to sale, but great to see. #hiddenstratification
9 replies, 72 likes

Oct 14 2019 Anima Anandkumar (hiring)

#AI for #healthcare and in particular #medicalimaging has a lot of potential. There should be big investments in #research and data collection. #AInotready Let us not apply the #siliconvalley ethos of "move fast break things" @EricTopol
5 replies, 64 likes

Oct 14 2019 Luke Oakden-Rayner

@jdunnmon @Stanford @TheAIML Link to the paper, where we show pretty drastic performance drops in hidden strata, e.g. cases of pneumothorax with/without chest drains (AUC 0.94 vs 0.77) or subsets within a normal/abnormal task (AUC 0.91 overall vs 0.76 for a major diagnosis class)
2 replies, 11 likes

Oct 14 2019 David Van Valen

Developing deep learning models and developing datasets are tasks that should not be separated. When it comes to performance in a software 2.0 world, there's little difference between code and data.
0 replies, 11 likes

Jan 24 2020 M. Alican Noyan

If you are doing applied ML you should be aware of #hiddenstratification. Read more here:
0 replies, 7 likes

Oct 15 2019 Luke Oakden-Rayner

Link to paper:
1 replies, 5 likes

Oct 14 2019 Volkan Cevher - LIONS

0 replies, 5 likes

Oct 14 2019 Debashis Ghosh

This is a VERY important blogpost (and associated paper discussing a problem that has recurred in many scientific domains. Kudos to the authors @DrLukeOR @jdunnmon @ghcarneiro Chris Ré @HazyResearch
0 replies, 4 likes

Oct 16 2019 Daniel Beck

Tinkering with models without looking at your data and throwing your results on arXiv is all good fun... until your deployed classifier gives false negatives for pneumotorax...
0 replies, 3 likes

Oct 15 2019 Xiao Liu

Neat and achievable suggestion on how to add human intelligence into AI auditing. Another great blog by @DrLukeOR and @jdunnmon. Worth reading the full paper to see all 3 proposed strategies for tackling 'hidden stratification'.
1 replies, 2 likes

Oct 19 2019 Vivek Natarajan

@roydanroy @suchisaria Great point demonstrated by recent work like these showing easily NNs hone in on spurious correlations 1> (melanoma prediction and surgical skin markers) 2> (pneumothorax prediction and chest drains)
1 replies, 1 likes

Oct 14 2019 Aakash Kumar Nain 🔎

And this is why "dataset curation" is an art!
0 replies, 1 likes