Papers of the day   All papers

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Comments

Jake VanderPlas: The frequency of random seeds between 0 and 1000 on github (data from http://grep.app) https://t.co/Zmp7mwMWil

60 replies, 1520 likes


Jesse Dodge: Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping https://arxiv.org/abs/2002.06305 We found surprisingly large variance just from random seeds when fine-tuning BERT. Both weight inits and the order of the training data have big impact. 1/n

12 replies, 458 likes


Thomas Wolf: Happy to see Dodge et al. (http://arxiv.org/abs/2002.06305) settling this question once and for all The best random seed is 12 A major part of the Deep Learning Research Program can now be considered solved *rubs his hands together* https://t.co/NrD4xOZAee

11 replies, 184 likes


Marcin Junczys-Dowmunt (Marian NMT): MT people, your BLEU values can vary by 1 point or more based on random seed choice as well. So when you report your results without investigating that you have no idea what you are actually reporting.

4 replies, 65 likes


Gabriel Ilharco: New paper out! In NLP, fine-tuning large pretrained models like BERT can be a very brittle process. If you're curious about this, this paper is for you! https://arxiv.org/pdf/2002.06305.pdf Work with the amazing @JesseDodge, @royschwartz02, Ali Farhadi, @HannaHajishirzi & @nlpnoah 1/n

1 replies, 65 likes


Thomas Wolf: Check out the paper, it's a great read: "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping" by @JesseDodge @gabriel_ilharco @royschwartz02 Ali Farhadi, Hannaneh Hajishirzi and @nlpnoah http://arxiv.org/abs/2002.06305

1 replies, 29 likes


Yonatan Belinkov: This is the kind of “common knowledge” that I’ve heard floating around, but not really documented. It’s great to have a detailed study.

1 replies, 20 likes


Leshem Choshen: For anyone outside academia, you probably noticed that BERTs differ by seeds. https://arxiv.org/pdf/2002.06305.pdf quantify by how much. Suggestions: 1. take the best of +-7 2. try many, stop ones that show no promise early on @royschwartz02 @nlpnoah @alifarhadi @JesseDodge @gabriel_ilharco

1 replies, 17 likes


Noah Smith: New work on the roles of random seeds in fine-tuning by @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, and @nlpnoah

0 replies, 16 likes


Amirhossein Tebbifakhr: Random seed impact fine-tuning BERT, https://arxiv.org/pdf/2002.06305.pdf suggests, fine-tune many, stop early non-promising ones, and continue some. by: @JesseDodge @gabriel_ilharco @royschwartz02 @alifarhadi @nlpnoah cc: @fbk_mt

0 replies, 11 likes


MT Group at FBK: Our pick of the week: @JesseDodge et al. paper on "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". By @at_amir https://arxiv.org/pdf/2002.06305.pdf #nlproc #deeplearning #bert @gabriel_ilharco @royschwartz02 @HannaHajishirzi @nlpnoah

0 replies, 5 likes


Jeff Dalton: A bit scary 😱 that random seeds and data order should matter...

1 replies, 5 likes


ML and Data Projects To Know: 📙 Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping Authors: @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, @nlpnoah Paper: https://arxiv.org/abs/2002.06305

0 replies, 3 likes


La Forge AI: [2002.06305] Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping via @IgnavierN @Minthos_ @jstnclmnt @ceobillionaire https://arxiv.org/abs/2002.06305

0 replies, 1 likes


Djamé: real question: I'm most certainly missing something but how come people are surprised by the variance between results linked to different random seeds? in the pre-deep learning parsing era, this was a given fact. (Petrov, 2010) https://www.aclweb.org/anthology/N10-1003.pdf https://t.co/vGRSIIbW3L

1 replies, 1 likes


Content

Found on Apr 08 2020 at https://arxiv.org/pdf/2002.06305.pdf

PDF content of a computer science paper: Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping