Papers of the day   All papers

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Comments

Eric Wallace: Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster. However, you should actually *increase* model size to speed up training and inference for transformers. Why? [1/6] 👇 http://bair.berkeley.edu/blog/2020/03/05/compress/ http://arxiv.org/abs/2002.11794 https://t.co/ivKyNo1ve0

16 replies, 1182 likes


(((ل()(ل() 'yoav)))): "larger models train better and compress better". very interesting (and useful!) empirical observation. and also points to a big theoretical question. we clearly don't understand the role of model sizes very well yet.

3 replies, 155 likes


Eric Wallace: See all of this in: "Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers" By @zhuohan123, @Eric_Wallace_, @nlpkevinl, @shengs1123, Kurt Keutzer, Dan Klein, @mejoeyg Blog http://bair.berkeley.edu/blog/2020/03/05/compress/ Paper http://arxiv.org/abs/2002.11794

5 replies, 133 likes


Eric Wallace: If you are interested in making Transformer NLP models more efficient at training and inference time, our work was accepted to #icml2020. Camera-ready paper: http://arxiv.org/abs/2002.11794 Slides: https://www.ericswallace.com/slides_and_posters/train_large.pdf

2 replies, 123 likes


Kevin Lin: how big should you make your model for fast training & inference of Transformers? we accelerate BERT and MT training & inference by _increasing_ model size and stopping early https://arxiv.org/abs/2002.11794 w/ @zhuohan123, @Eric_Wallace_, @shengs1123, Kurt Keutzer, Dan Klein, @mejoeyg

0 replies, 23 likes


David Page: Great study of training efficiency at large scale + nice results on compression for inference!

1 replies, 12 likes


Zhuohan Li: Check our latest work! We show that accelerate BERT and MT training & inference by _increasing_ model size and stopping early! Blog: https://bair.berkeley.edu/blog/2020/03/05/compress/ Paper: https://arxiv.org/abs/2002.11794 w/ @Eric_Wallace_, @shengs1123, @nlpkevinl, Kurt Keutzer, Dan Klein, @mejoeyg

0 replies, 12 likes


Sam Bowman: Impressive work by @zhuohan123 et al.:

1 replies, 11 likes


Sourabh Katoch: Let me forward this to my previous boss. He used to call me crazy for saying this 😅

0 replies, 3 likes


fools.doc: I remember once getting really pissed off at a widely respected humanities scholar characterizing CS in a PMLA article as primarily about "accuracy." Any first-year CS major can tell you it's about maximizing efficiency. Here's one of countless examples illustrating that!

0 replies, 3 likes


Gideon Mann: Bigger is faster?? What?? Mind blown 🤯

0 replies, 2 likes


akira: https://arxiv.org/abs/2002.11794 When computational resources are limited, it is suggested that it is better to train and compress a large model than to train/infer with a small model. The larger models converge faster and have less of a drop in accuracy when compressed. https://t.co/XolI1nNthQ

0 replies, 1 likes


arXiv CS-CL: Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers http://arxiv.org/abs/2002.11794

0 replies, 1 likes


Content

Found on Mar 05 2020 at https://arxiv.org/pdf/2002.11794.pdf

PDF content of a computer science paper: Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers