Papers of the day   All papers

How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods

Comments

Nov 07 2019 Hima Lakkaraju (Recruiting Students and Postdocs)

Want to know how adversaries can game explainability techniques? Our latest research - "How can we fool LIME and SHAP? Adversarial Attacks on Explanation Methods" has answers: http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_
7 replies, 238 likes


Dec 31 2019 Andrew Ng

Do you have an example of an underrated or underreported AI result from 2019--something that deserves to be more widely known? Please reply and share your thoughts!
29 replies, 144 likes


Dec 06 2019 Hima Lakkaraju (Recruiting Students and Postdocs)

Two of our papers just got accepted for oral presentation at AAAI Conference on AI and Ethics (AIES): 1. Designing adversarial attacks on explanation techniques (https://arxiv.org/pdf/1911.02508.pdf) 2. How misleading explanations can be used to game user trust? (https://arxiv.org/pdf/1911.06473.pdf)
5 replies, 144 likes


Jan 31 2020 π™·πš’πš–πšŠ π™»πšŠπš”πš”πšŠπš›πšŠπš“πšž

Just noticed that there is an HBR article that discusses our recent work on fooling ML explanation methods: https://hbr.org/2019/12/the-ai-transparency-paradox. Yayy! @dylanslack20 @emilycjia @sameer_ Our paper: https://arxiv.org/abs/1911.02508
0 replies, 42 likes


Nov 27 2019 Willie Boag

Just read @dylanslack20's paper on fooling LIME and SHAP with adversarial attacks (https://arxiv.org/pdf/1911.02508.pdf). Neat paper with a simple & clear message! Had a lot of fun making slides for it for my lab's reading group, so sharing for anyone interested: https://drive.google.com/file/d/1ay_9ayZOvUptUBx-OWT54dU6JtKN_NYY/view?usp=sharing
3 replies, 35 likes


Nov 07 2019 Hima Lakkaraju (Recruiting Students and Postdocs)

Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Turns out you can! More details in our recent research: https://arxiv.org/abs/1911.02508
1 replies, 23 likes


Nov 07 2019 dylan_slack

Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Our recent research suggests this is possible.
0 replies, 16 likes


Nov 07 2019 sorelle

Always nice to see former @haverfordedu research advisees go on to do interesting work! Check out this new paper by @dylanslack20 class of '19 and UCI / Harvard team.
0 replies, 16 likes


Jan 30 2020 sorelle

The recent work just mentioned #FAT2020 on how explanations can be gamed is by @dylanslack20 et al: https://arxiv.org/abs/1911.02508
0 replies, 15 likes


Nov 07 2019 Somesh Jha

I think this line of research is super-interesting. Folks are proposing techniques that build on explainability techniques, but they are brittle. Interesting paper. On my stack:-)
1 replies, 14 likes


Jan 06 2020 HotComputerScience

Most popular computer science paper of the day: "How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods" https://hotcomputerscience.com/paper/how-can-we-fool-lime-and-shap-adversarial-attacks-on-post-hoc-explanation-methods https://twitter.com/hima_lakkaraju/status/1192263250882289665
1 replies, 14 likes


Nov 07 2019 Battista Biggio

If it is based on ML, it is vulnerable (to deliberate attacks). Another good example of that.
0 replies, 12 likes


Jan 26 2020 Dylan Slack

Next week, I'll be at AIES presenting a paper on post-hoc interpretation attack techniques "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods" (https://arxiv.org/abs/1911.02508).
1 replies, 8 likes


Nov 07 2019 rhema vaithianathan

Those trying to regulate AI be aware! As someone who build ML tools for high stakes decisions, I can almost always comply (trivially) with audit rules because I always have more degrees of freedom than the auditor. It's a classic mechanism design problem.
0 replies, 7 likes


Dec 06 2019 Dylan Slack

Really excited to share this work at @AIESConf! πŸ˜€
1 replies, 7 likes


Nov 07 2019 Aakash Kumar Nain

Fooling LIME and SHAP! Amazing!
0 replies, 7 likes


Feb 04 2020 u++

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods https://arxiv.org/abs/1911.02508
0 replies, 5 likes


Nov 07 2019 AI4LIFE

Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Turns out you can! You should not miss our recent research: http://arxiv.org/abs/1911.02508
0 replies, 4 likes


Nov 15 2019 Hagai Rossman

Very cool paper on adversarial attacks on model interpretation methods (SHAP, LIME) by @hima_lakkaraju As these methods are being used more and more to explain complex models in high-stakes domains (such as medical), adversarial attacks is something we have to be aware of
1 replies, 4 likes


Nov 07 2019 Karandeep Singh

β€œIn this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable... we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques.”
0 replies, 3 likes


Dec 14 2019 Jason H. Moore, PhD

How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods https://arxiv.org/abs/1911.02508 #datascience #machinelearning #airesearch
0 replies, 3 likes


Dec 07 2019 Mickey McManus

Woah. Is it fair to say that most pitch decks are adversarial attacks on rationality? Fascinating papers.
0 replies, 1 likes


Nov 07 2019 Hima Lakkaraju (Recruiting Students and Postdocs)

Relying too much on explanation techniques? You must definitely read our recent research: "How can you fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods" http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_
0 replies, 1 likes


Nov 07 2019 Hima Lakkaraju (Recruiting Students and Postdocs)

Very excited about our latest research on "How can we fool LIME and SHAP? Adversarial Attacks on Explanation Methods" http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_
0 replies, 1 likes


Content