Utku: End-to-end training of sparse deep neural networks with little-to-no performance loss. Check out our new paper: “Rigging the Lottery: Making All Tickets Winners” (RigL👇) !
with @Tgale96 @jacobmenick @pcastr and @erich_elsen https://t.co/LmR18hK4LV
1 replies, 359 likes
hardmaru: Everyone is a winner 🔥
1 replies, 263 likes
DeepMind: We also introduce a technique [https://arxiv.org/abs/1911.11134] for training neural networks that are sparse throughout training from a random initialization - no luck required, all initialization “tickets” are winners. https://t.co/fA7VmXrj20
0 replies, 127 likes
Delip Rao: Great paper title, with results to match. “MobileNets are efficient networks and difficult to sparsify. With RigL we can train 75% sparse MobileNets with almost no drop in accuracy.” 😱😱
1 replies, 46 likes
Sara Hooker: What differs in this paper is how the connections are grown after pruning for the most important weights. I think this is part of a very interesting direction of research, amplifying the role of weights estimated to be important (in addition to removing the “weakest” links).
0 replies, 24 likes
Pablo Samuel Castro: 🎟️🎟️make everyone a lottery winner🎟️🎟️
train sparse networks (with a randomly initialized topology) end-to-end without sacrificing (much) accuracy!
joint work with @utkuevci @Tgale96 @jacobmenick and @erich_elsen
0 replies, 16 likes
Jacob Menick: New work by Utku Evci et al. on sparse training. My contribution was helping with the RNN experiments. Fun collaborating with @utkuevci and getting involved in sparse man @erich_elsen's sweeping sparsity research programme.
0 replies, 11 likes
Jesse Engel: Sparsity is a clear inductive bias for neural nets, but end to end training and efficient inference have always been a challenge. I know @erich_elsen has been thinking about this for a long time, and seems like they've made some real progress!
0 replies, 7 likes
Daisuke Okanohara: RigL trains sparse NNs from scratch; regularly drops the edges with the smallest magnitude, computes the gradients wrt virtual dense edges, and introduces new edges with the largest gradient. Escaping bad local minima by making a new descending direction. https://arxiv.org/abs/1911.11134
0 replies, 7 likes
Brundage Bot: Rigging the Lottery: Making All Tickets Winners. Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen http://arxiv.org/abs/1911.11134
1 replies, 4 likes
Instead of choosing good initial values in favor of "Lottery Theory", they propose RigL to train a sparse and accurate network from any initial value.The learning time does not increase greatly, inference speed is improved with the same accuracy. https://t.co/b8BJbEiaWE
0 replies, 2 likes
Mitchell Gordon: Really cool improvements on Tim Dettmer's work; now sparse networks really can be trained from scratch using less GPU memory!
0 replies, 1 likes
Carles R. Riera: Well, we are back to the 2000 with the return of constructive-deconstructive methods. Glad to see this.
Instead of finding the correct initialization they add and remove units according to the gradient.
1 replies, 1 likes
Fabien Da Silva: @owulveryck @arxiv - https://arxiv.org/abs/1911.11134 Rigging the Lottery: Making All Tickets Winners
- https://arxiv.org/abs/1911.04252 Self-training with Noisy Student improves ImageNet classification
- https://arxiv.org/abs/1910.08435 Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Input
1 replies, 0 likes
Found on Nov 26 2019 at https://arxiv.org/pdf/1911.11134.pdf