Papers of the day   All papers

WELL-READ STUDENTS LEARN BETTER: ON THE IMPORTANCE OF PRE-TRAINING COMPACT MODELS

Comments

Iulia-Raluca Turc: Efficient BERT models from Google Research, now available at https://github.com/google-research/bert! We hope our 24 BERT models with fewer layers and/or hidden sizes will enable research in resource-constrained institutions and encourage building more compact models. https://arxiv.org/abs/1908.08962

2 replies, 665 likes


Hugging Face: Efficient mini-BERT models from Google Research, now available at https://huggingface.co/google thanks to @iuliaturc / @GoogleAI ! 24 sizes pre-trained directly with MLM loss and are competitive to more elaborate pre-training strategies involving distillation (https://arxiv.org/abs/1908.08962). https://t.co/OGOSw7ZtXJ

1 replies, 387 likes


Julien Chaumond: 🔥 Google AI weights, directly inside @huggingface transformers 🔥 https://twitter.com/huggingface/status/1239937298747310081 https://huggingface.co/google

1 replies, 92 likes


Sasha Rush: Tracking down a bunch of interesting BERT models for @huggingface this week. Let me know if there are other models you would like us to include. (There are now easy self-serve tools to upload / describe models.)

1 replies, 45 likes


Iulia Turc: Our BERT miniatures were pre-trained directly with MLM loss. They are competitive to more elaborate pre-training strategies involving MLM distillation (https://arxiv.org/abs/1908.08962). Our models can be fine-tuned for downstream tasks via standard training or end-task distillation.

1 replies, 21 likes


Jacob Eisenstein: BERTs for all sizes and budgets! Cool work from my teammates that will make state-of-the-art NLP available in many more computational settings.

0 replies, 15 likes


arxiv: Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distil... http://arxiv.org/abs/1908.08962 https://t.co/NtxbMFKjyT

0 replies, 9 likes


arXiv CS-CL: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models http://arxiv.org/abs/1908.08962

0 replies, 3 likes


arXiv CS-CL: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models http://arxiv.org/abs/1908.08962

0 replies, 3 likes


Tim Finin: Google released 24 new compact #BERT models (English, uncased, trained with WordPiece masking) intended for environments with restricted computational resources. The models are available at https://github.com/google-research/bert with details described in https://arxiv.org/abs/1908.08962 #NLP

0 replies, 2 likes


akira: https://arxiv.org/abs/1908.08962 When using a small model in NLP, the research shows that with "Pre-Training->Distillation->Fine Tune",you can get more benefit from the knowledge of the big model and the pre-train. The results show that combined distillation/pre-learning is more efficient https://t.co/FYUd9UVH7i

0 replies, 1 likes


Iulia-Raluca Turc: Our BERT miniatures were pre-trained directly with MLM loss. They are competitive to more elaborate pre-training strategies involving MLM distillation (https://arxiv.org/abs/1908.08962). Our models can be fine-tuned for downstream tasks via standard training or end-task distillation.

0 replies, 1 likes


Content

Found on Mar 11 2020 at https://arxiv.org/pdf/1908.08962.pdf

PDF content of a computer science paper: WELL-READ STUDENTS LEARN BETTER: ON THE IMPORTANCE OF PRE-TRAINING COMPACT MODELS