Papers of the day   All papers

Batch Normalization Biases Deep Residual Networks Towards Shallow Paths

Comments

DeepMind: We show that batch normalisation biases deep residual networks towards shallow paths with well-behaved gradients. This dramatically increases the largest trainable depth. We can recover this benefit with a simple change to the initialisation scheme: https://arxiv.org/abs/2002.10444

6 replies, 738 likes


👩‍💻 DynamicWebPaige: "SkipInit: 1-line code change that can train deep residual networks without normalization, and also enhances the performance of shallow residual networks. We therefore conclude that one no longer needs normalization layers to train deep residual networks with small batch sizes."

1 replies, 20 likes


Yee Whye Teh: Nice work by Sam Smith and Soham De.

0 replies, 13 likes


Daisuke Okanohara: Batch normalization makes residual networks to use shallow paths by downscaling residual blocks and increase the trainable depth. We can obtain the same effect by just introducing a scalar multiplier initialized to 0 at the end of each residual branch. https://arxiv.org/abs/2002.10444

0 replies, 3 likes


akira: https://arxiv.org/abs/2002.10444 With BatchNorm, the contribution of a shallower paths of ResNet increase because of reduce variance of blocks' output and It makes the training effective. Same effect can be obtained with introducing coefficient to to suppress variance in each block. https://t.co/14GdfNJlgr

0 replies, 2 likes


Statistics Papers: Batch Normalization Biases Deep Residual Networks Towards Shallow Paths. http://arxiv.org/abs/2002.10444

0 replies, 2 likes


Jesper Dramsch: Batch normalization did not work on some of the problems I worked on. Basically lost all information necessary for the regression of physical data. I was wondering if better initializations would help. Seems they do! #ml #machinelearning https://twitter.com/DeepMind/status/1232324838070669313

0 replies, 2 likes


Greg Yang: @unsorsodicorda @KyleLLuther1 I started writing it but then other things ended getting priority :( However these guys https://arxiv.org/pdf/2002.10444.pdf essentially gives you the right thing for resnet with batchnorm

0 replies, 1 likes


Brundage Bot: Batch Normalization Biases Deep Residual Networks Towards Shallow Paths. Soham De and Samuel L. Smith http://arxiv.org/abs/2002.10444

1 replies, 1 likes


Content

Found on Feb 25 2020 at https://arxiv.org/pdf/2002.10444.pdf

PDF content of a computer science paper: Batch Normalization Biases Deep Residual Networks Towards Shallow Paths