Quoc Le: A surprising result: We found that smooth activation functions are better than ReLU for adversarial training and can lead to substantial improvements in adversarial robustness.
21 replies, 1242 likes
Jeff Dean (@🏡): Smooth!
4 replies, 252 likes
hardmaru: Cool result in “Smooth Adversarial Training” by @cihangxie et al. They show smooth versions of ReLU function can significantly push the “Pareto frontier” towards getting both better accuracy and adversarial robustness, due to desirable gradient properties.
2 replies, 104 likes
Mingxing Tan: SAT: Smooth Adversarial Training. It turns out the non-continuous gradients of ReLU is a major issue. Replacing ReLU with Swish/GELU/ELU significantly improves robustness.
New SOTA results with SAT (http://arxiv.org/abs/2006.14536): https://t.co/Lj9hB3F8jY
0 replies, 53 likes
Ankur Handa: This looks interesting and I also found this paper "Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem" https://arxiv.org/abs/1812.05720 useful as well.
0 replies, 42 likes
Cihang Xie: Check out our latest work on studying the effects of activation functions in adversarial training.
We found that making activation functions be SMOOTH is critical for obtaining much better robustness.
Joint work with @tanmingxing, @BoqingGo, @YuilleAlan and @quocleix.
1 replies, 38 likes
Daisuke Okanohara: Using a smooth activation function (e.g., Swish, ELU) instead of ReLU can significantly improve the robustness against adversarial attacks while keeping accuracy. The gradient quality matters in adversarial training. https://arxiv.org/abs/2006.14536
0 replies, 35 likes
Carlo Lepelaars: Just finished reading this paper. Beautiful insights!
Really puts the power of Swish activations into perspective.
Very curious to see if other NAS-derived layers such as Evonorm(-S0) are also significantly more robust compared to Batch-norm + ReLU.
0 replies, 13 likes
arXiv CS-CV: Smooth Adversarial Training http://arxiv.org/abs/2006.14536
0 replies, 5 likes
Martin Roberts: Very interesting observation.
0 replies, 4 likes
Stella Rose: This was invented by Stefan Elfwing, Eiji Uchibe, and Kenji Doya in their paper “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.” It's called SILU, not SWISH. Renaming their technique is a form of plagiarism.
4 replies, 2 likes
Found on Jun 29 2020 at https://arxiv.org/pdf/2006.14536.pdf