Jesse Dodge: Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
We found surprisingly large variance just from random seeds when fine-tuning BERT. Both weight inits and the order of the training data have big impact.
14 replies, 463 likes
Daniel Roy: Your SOTA code may only be SOTA for some random seeds. Nonsense or new reality? I suppose there are trivial ways to close the gap using restarts and validation data.
23 replies, 311 likes
Clément Canonne: Controversial opinion: We should stop teaching students any ML methods other than picking a random seed. There is so much inertia against forgetting things that never panned out. It's hard to let go of ideas, especially one's own, but our aim should be to make real progress in AI
5 replies, 258 likes
Thomas Wolf: Happy to see Dodge et al. (http://arxiv.org/abs/2002.06305) settling this question once and for all
The best random seed is 12
A major part of the Deep Learning Research Program can now be considered solved
*rubs his hands together* https://t.co/NrD4xOZAee
11 replies, 183 likes
Marcin Junczys-Dowmunt (Marian NMT): MT people, your BLEU values can vary by 1 point or more based on random seed choice as well. So when you report your results without investigating that you have no idea what you are actually reporting.
4 replies, 65 likes
Gabriel Ilharco: New paper out!
In NLP, fine-tuning large pretrained models like BERT can be a very brittle process. If you're curious about this, this paper is for you! https://arxiv.org/pdf/2002.06305.pdf
Work with the amazing @JesseDodge, @royschwartz02, Ali Farhadi, @HannaHajishirzi & @nlpnoah
1 replies, 65 likes
Charles Sutton: Of course if you call it “tuning the random seed”, it sounds silly. Is it really? Commonly you need to do random restarts in global optimization. That’s what changing the seed is. Why should that bother us?
6 replies, 57 likes
Anna Rogers: Dear people maintaining leaderboards: what do you recommend to do about variance due to model inits? @robinomial @sleepinyourhat @nlpmattg
We were about to release some data. Then @JesseDodge @nlpnoah did this:
And now it's all darkness and misery.
4 replies, 38 likes
Thomas Wolf: Check out the paper, it's a great read:
"Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping"
by @JesseDodge @gabriel_ilharco @royschwartz02 Ali Farhadi, Hannaneh Hajishirzi and @nlpnoah
1 replies, 29 likes
Yonatan Belinkov: This is the kind of “common knowledge” that I’ve heard floating around, but not really documented. It’s great to have a detailed study.
1 replies, 20 likes
Leshem Choshen: For anyone outside academia, you probably noticed that BERTs differ by seeds. https://arxiv.org/pdf/2002.06305.pdf quantify by how much.
1. take the best of +-7
2. try many, stop ones that show no promise early on @royschwartz02 @nlpnoah @alifarhadi @JesseDodge @gabriel_ilharco
1 replies, 17 likes
Noah Smith: New work on the roles of random seeds in fine-tuning by @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, and @nlpnoah
0 replies, 16 likes
Amirhossein Tebbifakhr: Random seed impact fine-tuning BERT,
https://arxiv.org/pdf/2002.06305.pdf suggests, fine-tune many, stop early non-promising ones, and continue some.
by: @JesseDodge @gabriel_ilharco @royschwartz02 @alifarhadi @nlpnoah
0 replies, 11 likes
Pasquale Minervini: @zacharylipton @IgorCarron "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping" - https://arxiv.org/abs/2002.06305 (although @huggingface's source code reads extremely well, almost like a paper)
0 replies, 9 likes
MT Group at FBK: Our pick of the week: @JesseDodge et al. paper on "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". By @at_amir
#nlproc #deeplearning #bert @gabriel_ilharco @royschwartz02 @HannaHajishirzi @nlpnoah
0 replies, 5 likes
Jeff Dalton: A bit scary 😱 that random seeds and data order should matter...
1 replies, 5 likes
ML and Data Projects To Know: 📙 Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Authors: @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, @nlpnoah
0 replies, 3 likes
DrHB: Finally article about importance of random seed :) https://arxiv.org/abs/2002.06305 good read, interesting results:)
0 replies, 2 likes
La Forge AI: [2002.06305] Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
via @IgnavierN @Minthos_ @jstnclmnt @ceobillionaire
0 replies, 1 likes
Djamé: real question: I'm most certainly missing something but how come people are surprised by the variance between results linked to different random seeds? in the pre-deep learning parsing era, this was a given fact. (Petrov, 2010) https://www.aclweb.org/anthology/N10-1003.pdf https://t.co/vGRSIIbW3L
1 replies, 1 likes
Found on Feb 18 2020 at https://arxiv.org/pdf/2002.06305.pdf