Jake VanderPlas: The frequency of random seeds between 0 and 1000 on github (data from http://grep.app) https://t.co/Zmp7mwMWil
60 replies, 1520 likes
Jesse Dodge: Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
We found surprisingly large variance just from random seeds when fine-tuning BERT. Both weight inits and the order of the training data have big impact.
12 replies, 458 likes
Thomas Wolf: Happy to see Dodge et al. (http://arxiv.org/abs/2002.06305) settling this question once and for all
The best random seed is 12
A major part of the Deep Learning Research Program can now be considered solved
*rubs his hands together* https://t.co/NrD4xOZAee
11 replies, 184 likes
Marcin Junczys-Dowmunt (Marian NMT): MT people, your BLEU values can vary by 1 point or more based on random seed choice as well. So when you report your results without investigating that you have no idea what you are actually reporting.
4 replies, 65 likes
Gabriel Ilharco: New paper out!
In NLP, fine-tuning large pretrained models like BERT can be a very brittle process. If you're curious about this, this paper is for you! https://arxiv.org/pdf/2002.06305.pdf
Work with the amazing @JesseDodge, @royschwartz02, Ali Farhadi, @HannaHajishirzi & @nlpnoah
1 replies, 65 likes
Thomas Wolf: Check out the paper, it's a great read:
"Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping"
by @JesseDodge @gabriel_ilharco @royschwartz02 Ali Farhadi, Hannaneh Hajishirzi and @nlpnoah
1 replies, 29 likes
Yonatan Belinkov: This is the kind of “common knowledge” that I’ve heard floating around, but not really documented. It’s great to have a detailed study.
1 replies, 20 likes
Leshem Choshen: For anyone outside academia, you probably noticed that BERTs differ by seeds. https://arxiv.org/pdf/2002.06305.pdf quantify by how much.
1. take the best of +-7
2. try many, stop ones that show no promise early on @royschwartz02 @nlpnoah @alifarhadi @JesseDodge @gabriel_ilharco
1 replies, 17 likes
Noah Smith: New work on the roles of random seeds in fine-tuning by @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, and @nlpnoah
0 replies, 16 likes
Amirhossein Tebbifakhr: Random seed impact fine-tuning BERT,
https://arxiv.org/pdf/2002.06305.pdf suggests, fine-tune many, stop early non-promising ones, and continue some.
by: @JesseDodge @gabriel_ilharco @royschwartz02 @alifarhadi @nlpnoah
0 replies, 11 likes
MT Group at FBK: Our pick of the week: @JesseDodge et al. paper on "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". By @at_amir
#nlproc #deeplearning #bert @gabriel_ilharco @royschwartz02 @HannaHajishirzi @nlpnoah
0 replies, 5 likes
Jeff Dalton: A bit scary 😱 that random seeds and data order should matter...
1 replies, 5 likes
ML and Data Projects To Know: 📙 Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Authors: @JesseDodge, @gabriel_ilharco, @royschwartz02, Ali Farhadi, @HannaHajishirzi, @nlpnoah
0 replies, 3 likes
La Forge AI: [2002.06305] Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
via @IgnavierN @Minthos_ @jstnclmnt @ceobillionaire
0 replies, 1 likes
Djamé: real question: I'm most certainly missing something but how come people are surprised by the variance between results linked to different random seeds? in the pre-deep learning parsing era, this was a given fact. (Petrov, 2010) https://www.aclweb.org/anthology/N10-1003.pdf https://t.co/vGRSIIbW3L
1 replies, 1 likes
Found on Apr 08 2020 at https://arxiv.org/pdf/2002.06305.pdf