Papers of the day   All papers

Smooth Adversarial Training


Quoc Le: A surprising result: We found that smooth activation functions are better than ReLU for adversarial training and can lead to substantial improvements in adversarial robustness.

21 replies, 1242 likes

Jeff Dean (@🏡): Smooth!

4 replies, 252 likes

hardmaru: Cool result in “Smooth Adversarial Training” by @cihangxie et al. They show smooth versions of ReLU function can significantly push the “Pareto frontier” towards getting both better accuracy and adversarial robustness, due to desirable gradient properties.

2 replies, 104 likes

Mingxing Tan: SAT: Smooth Adversarial Training. It turns out the non-continuous gradients of ReLU is a major issue. Replacing ReLU with Swish/GELU/ELU significantly improves robustness. New SOTA results with SAT (

0 replies, 53 likes

Ankur Handa: This looks interesting and I also found this paper "Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem" useful as well.

0 replies, 42 likes

Cihang Xie: Check out our latest work on studying the effects of activation functions in adversarial training. We found that making activation functions be SMOOTH is critical for obtaining much better robustness. Joint work with @tanmingxing, @BoqingGo, @YuilleAlan and @quocleix.

1 replies, 38 likes

Daisuke Okanohara: Using a smooth activation function (e.g., Swish, ELU) instead of ReLU can significantly improve the robustness against adversarial attacks while keeping accuracy. The gradient quality matters in adversarial training.

0 replies, 35 likes

Carlo Lepelaars: Just finished reading this paper. Beautiful insights! Really puts the power of Swish activations into perspective. Very curious to see if other NAS-derived layers such as Evonorm(-S0) are also significantly more robust compared to Batch-norm + ReLU.

0 replies, 13 likes

arXiv CS-CV: Smooth Adversarial Training

0 replies, 5 likes

Martin Roberts: Very interesting observation.

0 replies, 4 likes

Stella Rose: This was invented by Stefan Elfwing, Eiji Uchibe, and Kenji Doya in their paper “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.” It's called SILU, not SWISH. Renaming their technique is a form of plagiarism.

4 replies, 2 likes


Found on Jun 29 2020 at

PDF content of a computer science paper: Smooth Adversarial Training