Papers of the day   All papers

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Comments

Victor Sanh: Excited to see our DistilBERT paper accepted at NeurIPS 2019 ECM^2 wkshp! 40% smaller 60% faster than BERT => 97% of the performance on GLUE w. a triple loss signal 💥We also distilled GPT2 in an 82M params model 📖https://arxiv.org/abs/1910.01108 Code&weights: https://github.com/huggingface/transformers https://t.co/nSB82ELBWD

8 replies, 363 likes


Dariusz Kajtoch: Distillation can produce smaller, faster and cheaper models that have comparable generalization performance than their complex, cumbersome counterparts. #NLProc #DeepLearning #DataScience https://arxiv.org/abs/1503.02531 https://arxiv.org/abs/1910.01108 @huggingface https://t.co/ktlhIvgNJu

1 replies, 108 likes


Stefan: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" 🤗 from @SanhEstPasMoi , @LysandreJik , @julien_c and @Thom_Wolf Now on arXiv: https://arxiv.org/abs/1910.01108 https://t.co/9fC7ZyKxnb

0 replies, 34 likes


Christophe Tricot: [#AI #NLP] DistilBERT : 40% smaller 60% faster than BERT => 97% of the performance on GLUE w. a triple loss signal

0 replies, 7 likes


Pratik Bhavsar: Language model sizes from distilBERT paper. DistilBERT has even lesser parameters than ELMo! @huggingface Please add ALBERT for comparison and maybe distilALBERT soon :D https://arxiv.org/pdf/1910.01108.pdf #bert #distillation #distilbert #nlp #nlproc #transferlearning #deeplearning https://t.co/bPBXwehkMI

0 replies, 6 likes


Sam Shleifer: Much credit goes to (pls tag yourself if I miss you!): @canwenxu for incredible art, tinybert https://arxiv.org/abs/1909.10351 distillbert : https://arxiv.org/abs/1910.01108 from @SanhEstPasMoi theseus : https://arxiv.org/abs/2002.02925 sequence level distillation: https://arxiv.org/abs/1606.07947

2 replies, 6 likes


arXiv CS-CL: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter http://arxiv.org/abs/1910.01108

0 replies, 4 likes


arXiv CS-CL: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter http://arxiv.org/abs/1910.01108

0 replies, 1 likes


arXiv CS-CL: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter http://arxiv.org/abs/1910.01108

0 replies, 1 likes


Content

Found on Oct 03 2019 at https://arxiv.org/pdf/1910.01108.pdf

PDF content of a computer science paper: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter