Papers of the day   All papers

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Comments

Hugging Face: Introducing PruneBERT, fine-*P*runing BERT's encoder to the size of a high-resolution picture (11MB) while keeping 95% of its original perf! Based on our latest work on movement pruning: https://arxiv.org/abs/2005.07683 Code and weights: https://github.com/huggingface/transformers/tree/master/examples/movement-pruning https://huggingface.co/huggingface/prunebert-base-uncased-6-finepruned-w-distil-squad

17 replies, 988 likes


Philip Vollet 🍥: PruneBERT has just been released save up to 97% of the original parameters and win incredible performance Paper https://arxiv.org/abs/2005.07683 GitHub https://github.com/huggingface/transformers/tree/master/examples/movement-pruning #deeplearning #machinelearning #python #nlp #datascience @huggingface https://t.co/EntRUwtscA

5 replies, 786 likes


Victor Sanh: Excited to share our latest work on extreme pruning in the context of transfer learning 🧀 95% of the original perf with only ~5% of remaining weights in the encoder💪 Paper: https://arxiv.org/abs/2005.07683 With amazing collaborators @Thom_Wolf & @srush_nlp at @huggingface [1/7] https://t.co/X2VnG3JvuI

4 replies, 661 likes


Thomas Wolf: Victor is releasing his new research work on extreme pruning of pretrained models! I really loved this project! A very deep dive to understand why & how standard pruning methods fail in the context of Transfer Learning and how we can do a lot better! Check his detailed thread👇

0 replies, 125 likes


Sasha Rush: New 🤗 preprint on pruning for transfer learning ("fine-pruning"). Exploits a simple idea: magnitude pruning stops making sense if weights don't really move. https://t.co/c7Uq0ZCPJX

0 replies, 85 likes


Leo Boytsov: "95% of the original perf with only ~5% remaining weights in the encoder!" is a great result by a team of @huggingface researchers https://arxiv.org/pdf/2005.07683.pdf

0 replies, 50 likes


Mario Kostelac: I took 30min to find another proof of this - https://arxiv.org/abs/2005.07683. 3% of weights, 95%+ accuracy. (paper by @SanhEstPasMoi , @Thom_Wolf, @srush_nlp from @huggingface)

1 replies, 36 likes


Carlos Gemmell: @omarsar0 Higher model efficiency. Much like history in aviation: bigger or more propellers < jet engines The answer might be a mix of isolating wasteful components and scaling up where it counts. @huggingface shows there’s dead space in current models. There’s a propeller staring at us.

1 replies, 25 likes


Leo Boytsov: From fine-tuning to fine-pruning! And as a reminder, here's a short overview of a case-study that uses "minimized" BERT as a viable production option (with extra tricks): https://medium.com/roblox-tech-blog/how-we-scaled-bert-to-serve-1-billion-daily-requests-on-cpus-d99be090db26

0 replies, 18 likes


Deepan Manoharan: 🔥From ~1.3 GB with 340 million parameters in BERT-large, @huggingface’s PruneBERT squashes it to just 11 MB with 95% of the original performance. This is just amazing. #NLP #BERT

0 replies, 10 likes


BIconnections: PruneBERT has just been released save up to 97% of the original parameters and win incredible performance Paper http://arxiv.org/abs/2005.07683 GitHub http://github.com/huggingface/tr #deeplearning #machinelearning #python #nlp #datascience @huggingface https://t.co/NSEJn7FA9I

1 replies, 9 likes


Aran Komatsuzaki: Movement Pruning: Adaptive Sparsity by Fine-Tuning Achieves minimal accuracy loss with down to only 3% of the model parameters. https://arxiv.org/abs/2005.07683 https://t.co/Lsa5C4zJuz

1 replies, 9 likes


ML and Data Projects To Know: 🖥️ Transformers by: @SanhEstPasMoi @Thom_Wolf Alexander M. Rush, @huggingface Code: https://github.com/huggingface/transformers/tree/master/examples/movement-pruning Paper: https://arxiv.org/abs/2005.07683

0 replies, 6 likes


Manu Romero: I have just fine-pruned ( w/ movement pruning) "bert-multi-uncased" on tydiQA for XQ&A https://huggingface.co/mrm8488/prunebert-multi-uncased-finepruned-tydiqa-for-xqa Test set results: F1: 70.10 EM 56.66

0 replies, 4 likes


Victor Sanh: Paper: https://arxiv.org/abs/2005.07683 Code & Weights will be released very soon! Stay tuned! In the meantime, here’s a sneak peek at the memory size compressions: [7/7] https://t.co/7FGokZAsBR

2 replies, 4 likes


Jared Nielsen: PruneBERT, only 11MB parameters (10% of BERT-base). For comparison, ALBERT-base is 12M parameters and has 2x the layers. Check this out for real-time inference!

0 replies, 4 likes


LumenAI: #Pruning #DeepLearning

0 replies, 3 likes


Delip Rao: Great update from @huggingface!

0 replies, 3 likes


arXiv CS-CL: Movement Pruning: Adaptive Sparsity by Fine-Tuning http://arxiv.org/abs/2005.07683

0 replies, 2 likes


Marek Bardoński: A paper from @SanhEstPasMoi and @Thom_Wolf on movement pruning, a simple deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. https://arxiv.org/pdf/2005.07683.pdf

0 replies, 1 likes


arXiv CS-CL: Movement Pruning: Adaptive Sparsity by Fine-Tuning http://arxiv.org/abs/2005.07683

0 replies, 1 likes


Content

Found on Jun 01 2020 at https://arxiv.org/pdf/2005.07683.pdf

PDF content of a computer science paper: Movement Pruning: Adaptive Sparsity by Fine-Tuning