Papers of the day   All papers

Smooth Adversarial Training

Comments

Quoc Le: A surprising result: We found that smooth activation functions are better than ReLU for adversarial training and can lead to substantial improvements in adversarial robustness. http://arxiv.org/abs/2006.14536 https://t.co/OGD2jqtTRL

21 replies, 1242 likes


Jeff Dean (@🏡): Smooth!

4 replies, 252 likes


hardmaru: Cool result in “Smooth Adversarial Training” by @cihangxie et al. They show smooth versions of ReLU function can significantly push the “Pareto frontier” towards getting both better accuracy and adversarial robustness, due to desirable gradient properties. https://arxiv.org/abs/2006.14536

2 replies, 104 likes


Mingxing Tan: SAT: Smooth Adversarial Training. It turns out the non-continuous gradients of ReLU is a major issue. Replacing ReLU with Swish/GELU/ELU significantly improves robustness. New SOTA results with SAT (http://arxiv.org/abs/2006.14536): https://t.co/Lj9hB3F8jY

0 replies, 53 likes


Ankur Handa: This looks interesting and I also found this paper "Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem" https://arxiv.org/abs/1812.05720 useful as well.

0 replies, 42 likes


Cihang Xie: Check out our latest work on studying the effects of activation functions in adversarial training. We found that making activation functions be SMOOTH is critical for obtaining much better robustness. Joint work with @tanmingxing, @BoqingGo, @YuilleAlan and @quocleix.

1 replies, 38 likes


Daisuke Okanohara: Using a smooth activation function (e.g., Swish, ELU) instead of ReLU can significantly improve the robustness against adversarial attacks while keeping accuracy. The gradient quality matters in adversarial training. https://arxiv.org/abs/2006.14536

0 replies, 35 likes


Carlo Lepelaars: Just finished reading this paper. Beautiful insights! Really puts the power of Swish activations into perspective. Very curious to see if other NAS-derived layers such as Evonorm(-S0) are also significantly more robust compared to Batch-norm + ReLU.

0 replies, 13 likes


arXiv CS-CV: Smooth Adversarial Training http://arxiv.org/abs/2006.14536

0 replies, 5 likes


Martin Roberts: Very interesting observation.

0 replies, 4 likes


Stella Rose: This was invented by Stefan Elfwing, Eiji Uchibe, and Kenji Doya in their paper “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.” It's called SILU, not SWISH. Renaming their technique is a form of plagiarism.

4 replies, 2 likes


Content

Found on Jun 29 2020 at https://arxiv.org/pdf/2006.14536.pdf

PDF content of a computer science paper: Smooth Adversarial Training