Papers of the day   All papers

How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods

Comments

Hima Lakkaraju (Recruiting Students and Postdocs): Want to know how adversaries can game explainability techniques? Our latest research - "How can we fool LIME and SHAP? Adversarial Attacks on Explanation Methods" has answers: http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_

7 replies, 238 likes


Hima Lakkaraju (Recruiting Students and Postdocs): Two of our papers just got accepted for oral presentation at AAAI Conference on AI and Ethics (AIES): 1. Designing adversarial attacks on explanation techniques (https://arxiv.org/pdf/1911.02508.pdf) 2. How misleading explanations can be used to game user trust? (https://arxiv.org/pdf/1911.06473.pdf)

5 replies, 144 likes


Andrew Ng: Do you have an example of an underrated or underreported AI result from 2019--something that deserves to be more widely known? Please reply and share your thoughts!

29 replies, 144 likes


π™·πš’πš–πšŠ π™»πšŠπš”πš”πšŠπš›πšŠπš“πšž: Just noticed that there is an HBR article that discusses our recent work on fooling ML explanation methods: https://hbr.org/2019/12/the-ai-transparency-paradox. Yayy! @dylanslack20 @emilycjia @sameer_ Our paper: https://arxiv.org/abs/1911.02508

0 replies, 42 likes


Willie Boag: Just read @dylanslack20's paper on fooling LIME and SHAP with adversarial attacks (https://arxiv.org/pdf/1911.02508.pdf). Neat paper with a simple & clear message! Had a lot of fun making slides for it for my lab's reading group, so sharing for anyone interested: https://drive.google.com/file/d/1ay_9ayZOvUptUBx-OWT54dU6JtKN_NYY/view?usp=sharing

3 replies, 35 likes


Hima Lakkaraju (Recruiting Students and Postdocs): Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Turns out you can! More details in our recent research: https://arxiv.org/abs/1911.02508

1 replies, 23 likes


dylan_slack: Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Our recent research suggests this is possible.

0 replies, 16 likes


sorelle: Always nice to see former @haverfordedu research advisees go on to do interesting work! Check out this new paper by @dylanslack20 class of '19 and UCI / Harvard team.

0 replies, 16 likes


sorelle: The recent work just mentioned #FAT2020 on how explanations can be gamed is by @dylanslack20 et al: https://arxiv.org/abs/1911.02508

0 replies, 15 likes


HotComputerScience: Most popular computer science paper of the day: "How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods" https://hotcomputerscience.com/paper/how-can-we-fool-lime-and-shap-adversarial-attacks-on-post-hoc-explanation-methods https://twitter.com/hima_lakkaraju/status/1192263250882289665

1 replies, 14 likes


Somesh Jha: I think this line of research is super-interesting. Folks are proposing techniques that build on explainability techniques, but they are brittle. Interesting paper. On my stack:-)

1 replies, 14 likes


Battista Biggio: If it is based on ML, it is vulnerable (to deliberate attacks). Another good example of that.

0 replies, 12 likes


Dylan Slack: Next week, I'll be at AIES presenting a paper on post-hoc interpretation attack techniques "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods" (https://arxiv.org/abs/1911.02508).

1 replies, 8 likes


Dylan Slack: Really excited to share this work at @AIESConf! πŸ˜€

1 replies, 7 likes


Aakash Kumar Nain: Fooling LIME and SHAP! Amazing!

0 replies, 7 likes


rhema vaithianathan: Those trying to regulate AI be aware! As someone who build ML tools for high stakes decisions, I can almost always comply (trivially) with audit rules because I always have more degrees of freedom than the auditor. It's a classic mechanism design problem.

0 replies, 7 likes


u++: Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods https://arxiv.org/abs/1911.02508

0 replies, 5 likes


AI4LIFE: Wondering if you can game explainability methods (e.g. LIME/SHAP) to say whatever you want to? Turns out you can! You should not miss our recent research: http://arxiv.org/abs/1911.02508

0 replies, 4 likes


Hagai Rossman: Very cool paper on adversarial attacks on model interpretation methods (SHAP, LIME) by @hima_lakkaraju As these methods are being used more and more to explain complex models in high-stakes domains (such as medical), adversarial attacks is something we have to be aware of

1 replies, 4 likes


Jason H. Moore, PhD: How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods https://arxiv.org/abs/1911.02508 #datascience #machinelearning #airesearch

0 replies, 3 likes


Karandeep Singh: β€œIn this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable... we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques.”

0 replies, 3 likes


Hima Lakkaraju (Recruiting Students and Postdocs): Relying too much on explanation techniques? You must definitely read our recent research: "How can you fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods" http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_

0 replies, 1 likes


AI Village @ DEF CON: Come tonight (https://arxiv.org/pdf/1911.02508.pdf) to out live journal club of this paper: https://arxiv.org/pdf/1911.02508.pdf The paper has a very interesting & unique threatmodel were they attack the post-hoc explainer. This can enable attackers to foil model debugging & debiasing efforts.

0 replies, 1 likes


Hima Lakkaraju (Recruiting Students and Postdocs): Very excited about our latest research on "How can we fool LIME and SHAP? Adversarial Attacks on Explanation Methods" http://arxiv.org/abs/1911.02508. Joint work with the awesome team: @dylanslack20, Sophie, Emily, @sameer_

0 replies, 1 likes


Mickey McManus: Woah. Is it fair to say that most pitch decks are adversarial attacks on rationality? Fascinating papers.

0 replies, 1 likes


AI Village @ DEF CON: Come tonight (https://www.twitch.tv/aivillage) to out live journal club of this paper: https://arxiv.org/pdf/1911.02508.pdf The paper has a very interesting & unique threatmodel were they attack the post-hoc explainer. This can enable attackers to foil model debugging & debiasing efforts.

0 replies, 1 likes


Content

Found on Nov 07 2019 at https://arxiv.org/pdf/1911.02508.pdf

PDF content of a computer science paper: How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods