Papers of the day   All papers

Rethinking Pre-training and Self-training


Quoc Le: We researchers love pre-training. Our new paper shows that pre-training is unhelpful when we have a lot of labeled data. In contrast, self-training works well even when we have a lot of labeled data. SOTA on PASCAL segmentation & COCO detection. Link:

5 replies, 1043 likes

Sayak Paul: Here's a list of my favorite recent papers on transfer learning for vision: - BigTransfer: - VirTex: - SimCLRv2: - Self-training: Would love to see a T5-like paper for vision.

2 replies, 217 likes

Barret Zoph: Models and checkpoints are now open sourced for my recent work: "Rethinking Pre-training and Self-training". Paper link: Code Link: On COCO we achieve 54.3 AP and on Pascal Segmentation 90.5 mIOU!

1 replies, 114 likes

Thang Luong: Success of self-training extends to object detection and semantic segmentation! Key to SOTA results in PASCAL semantic segmentation is the usage of #NoisyStudent checkpoints EfficientNet-L2 :)

0 replies, 60 likes

Joan Serrà: Insightful paper comparing pre-trained (transfer learning) with self-trained models: TLDR: self-training >> pre-training (including self-supervised pre-training). Encouraging!

0 replies, 32 likes

Mingxing Tan: Excited to see self-training obtains SoTA accuracy on COCO detection and Pascal segmentation. What if you also need efficiency? Try out our updated EfficientDet (53.7AP, with 55M params and 122ms latency): Enjoy :)

0 replies, 24 likes

Hossein Mobahi: I see a rapidly growing success from "self-training" and "self-distillation" type methods recently. There is a lot of opportunity there for theoretical understanding and explanations with huge practical impact as these methods are now at the core to some SOTA models.

1 replies, 20 likes

午後のarXiv: "Rethinking Pre-training and Self-training", Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D…

0 replies, 13 likes

Leo Boytsov: If self supervised and supervised pretraining both have somewhat limited value in CV, is ther a hope for NLP? Do large self supervisedly trained Transformers work bc most NLP tasks are low data regime tasks (and NLP might need more data compared to vision?)

1 replies, 11 likes

arXiv CS-CV: Rethinking Pre-training and Self-training

0 replies, 11 likes

Daisuke Okanohara: Pre-training cannot improve (or even hurts) the performance when stronger data augmentation and large labeled data is available. On the other hand, self-training always helpful for low-data and large-data regimes with stronger data augmentation.

0 replies, 10 likes

arXiv CS-CV: Rethinking Pre-training and Self-training

0 replies, 6 likes

arXiv in review: #NeurIPS2020 Rethinking Pre-training and Self-training. (arXiv:2006.06882v1 [cs\.CV])

0 replies, 3 likes

Connor Shorten: Rethinking Pre-training and Self-training 📚 "Our results suggest that both supervised and self-supervised pre-training methods fail to scale as the labeled dataset size grows, while self-training is still useful."

1 replies, 1 likes

akira: If object detection is performed on all labels, or if strong data extensions are used, ImageNet pre-trained model can be degraded. But using Self-training (Noisy Student) to learn beforehand, found that even in those cases, the accuracy is improved.

1 replies, 1 likes


Found on Jun 15 2020 at

PDF content of a computer science paper: Rethinking Pre-training and Self-training