Papers of the day   All papers

Self-training with Noisy Student improves ImageNet classification


Quoc Le: Want to improve accuracy and robustness of your model? Use unlabeled data! Our new work uses self-training on unlabeled data to achieve 87.4% top-1 on ImageNet, 1% better than SOTA. Huge gains are seen on harder benchmarks (ImageNet-A, C and P). Link:

23 replies, 1563 likes

Quoc Le: Happy to announce that we've released a number of models trained with Noisy Student (a semi-supervised learning method). The best model achieves 88.4% top-1 accuracy on ImageNet (SOTA). Enjoy finetuning! Link: Paper:

11 replies, 1216 likes

Jeff Dean: Nice new results from @GoogleAI researchers on improving the state-of-the-art on ImageNet! "We...train a...model on...ImageNet...& use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger...model on the...labeled & pseudo labeled images."

6 replies, 536 likes

Quoc Le: Last week we released the checkpoints for SOTA ImageNet models trained by NoisyStudent. Due to popular demand, we’ve also opensourced an implementation of NoisyStudent. The code uses SVHN for demonstration purposes. Link: Paper:

4 replies, 434 likes

hiroto: "Self-training with Noisy Student improves ImageNet classification" achieves 87.4% top-1 accuracy. 1 Train a model on ImageNet 2 Generate pseudo labels on unlabeled extra dataset 3 Train a student model using all the data and make it a new teacher ->2

4 replies, 328 likes

Ilya Sutskever: Amazing unsupervised learning results:

3 replies, 203 likes

Thang Luong: Nice recent tutorial on semi-supervised learning, covering our recent works on UDA ( and #NoisyStudent ( It also highlights VAT, Pi-Model, MeanTeacher, and MixMatch. Slides:

1 replies, 134 likes

Thang Luong: Keep pushing on using more unlabeled data allows us to further advancing SOTA on ImageNet to 88.4% top-1 accuracy with Noisy Student ( Long process we worked through to release the checkpoints & now they are all yours! @QizheXie @quocleix

0 replies, 104 likes

Thang Luong: Another view of Noisy Student: semi-supervised learning is great even when labeled data is plentiful! 130M unlabeled images yields 1% gain over previous ImageNet SOTA that uses 3.5B weakly labeled examples! joint work /w @QizheXie, Ed Hovy, @quocleix

0 replies, 89 likes

Quoc Le: @hardmaru We have a few data points that suggest such improvements are meaningful: 1. Better ImageNet models transfer better to other datasets: 2. Better accuracy on ImageNet gives vast improvements in out-of-distro generalization:

3 replies, 74 likes

Eric Jang 🇺🇸🇹🇼: Self-training with Noisy Student: A semi-supervised approach by Google/CMU that outperforms Facebook's "weakly labeled 3.5B Instagram" method on ImageNet.

1 replies, 73 likes

Quoc Le: This work continues our efforts on semi-supervised learning such as UDA: MixMatch: FixMatch: Noisy Student: etc. Joint work with @hieupham789 @QizheXie @ZihangDai

1 replies, 64 likes

Thang Luong: We have started releasing #NoisyStudent code, first on SVHN for the community to quickly try. For ImageNet, we are looking into other public datasets as unlabeled data & will share soon, so stay tune! @quocleix @QizheXie

1 replies, 47 likes

Thang Luong: It is also important to note that adding noise to the student, training equal-or-larger students, and iterative self-training is a novel combination that defines the success of #NoisyStudent on ImageNet (, to appear in #CVPR).

1 replies, 37 likes

Quoc Le: This work continues our self-training efforts: Noisy Student Training (SOTA on ImageNet): Noisy Student Training for Speech (SOTA on LibriSpeech): Conclusion? Good results on big datasets need self-training (w/ Noisy Student) :)

3 replies, 36 likes

Quoc Le: Give NoisyStudent a try if you want to use unlabeled data to improve your supervised learning. See my earlier tweet for more context.

1 replies, 33 likes

Bindu Reddy 🔥❤️: You can train more accurate models by combining unlabelled data with labelled data. Google's latest paper uses a clever trick to take advantage of loads of unlabelled data that most organizations have. One more step in truly democratizing AI -

1 replies, 30 likes

Quoc Le: See my previous tweet for context:

1 replies, 29 likes

Daniel Situnayake: This seems like a intriguing approach when you have a ton of unlabelled data: 1) Train a classifier on a labeled set of data 2) Use it to pseudo-label a much larger unlabelled dataset 3) Train a larger classifier on the combined sets 4) Iterate the process, adding noise

3 replies, 29 likes

Stanisław Jastrzębski: So do deep networks 'interpolate' or do they 'extrapolate'? :) For context see or @GaryMarcus critique of deep learning; I think most people would classify ImageNet-A as 'extrapolation', but also unclear what is the unlabeled dataset overlap with ImageNetA

0 replies, 15 likes

Carlo Lepelaars: Reading the Noisy Student and EfficientDet papers. @quocleix and the other researchers made a big breakthrough with EfficientNet and now we are reaping the benefits of these more efficient models. 😎 Noisy Student: EfficientDet:

1 replies, 12 likes

Rajat Monga: Love the simplicity.

0 replies, 11 likes

Daisuke Okanohara: Self-training (training a student using an unlabeled dataset with labels estimated by a teacher) benefit from using a larger model for students and injecting noises at student training. Achieved new SOTA on ImageNet and challenging ImageNet-A (17%->74%)

0 replies, 11 likes

Andrey Kurenkov 🤖: wow neat trick. So simple, so effective! Kind of surprising this works so well, you'd think semi-supervised learning without injecting noisy labels would work better... seems unsupervised learning is just tough compared to supervised? Looking forward to theory :)

0 replies, 10 likes

mat kelcey: the adding noise result is a great idea but the most surprising thing about this result is the responses from people who didn't know self training was a thing!

2 replies, 7 likes

Aakash Kumar Nain: Another really good paper from @quocleix

2 replies, 5 likes

Saptarshi Purkayasth: Have a look at this @judywawira. I have a feeling that weak labels extracted from rad reports + image classifier can be made stronger using this approach.

0 replies, 4 likes

Moez Baccouche: Very interesting work by Google Brain on « Self-training » : 1. Train a model on ImageNet 2. Infer labels on unlabeled dataset 3. Train a student model using all the data and make it a new teacher 4. Go to 2. This leads to new sota on imagenet with with 87.4% top-1 accuracy.

0 replies, 4 likes

George Seif: Very cool idea to get state of the art on ImageNet by @GoogleAI #DeepLearning

0 replies, 3 likes

David Luan: Amazing progress using clever ideas that are also simple to explain.

0 replies, 3 likes

eSteve almirall: Image recognition with Deep Learning is improving and solving fundamental problems of labeled data with self-training !!! kudos for @GoogleAI @XavierFerras @oalcoba @ganyet @ProfVives @albertcuesta

0 replies, 3 likes

Shital Shah: This shall go down as one of the great abstracts. Did they just said they improved SOTA on adversarial ImageNet from 16.6% to 74.2%, daug? You bet they did!

0 replies, 2 likes

Andrew Lavin: Self-training leads EfficientNet to a new state-of-the-art in ImageNet classification accuracy. But the exciting result is really the vast improvement to classification robustness.

0 replies, 2 likes

Somshubra Majumdar: A semi-simple method that I will probably try soon.

1 replies, 2 likes

Tobias Sterbak: Pseudo labeling with noise is such an elegant (and effective) idea! Great work work by Quoc V. Le and team! #deeplearning #neuralnetworks #computervision

0 replies, 1 likes

Christian Szegedy: A cool semi-supervised training trick.

0 replies, 1 likes

arXiv CS-CV: Self-training with Noisy Student improves ImageNet classification

0 replies, 1 likes

akira: Create a more accurate model by repeating the process : “Adding noise to the pseudo-label data ,which is created with model that learned ImageNet. Then Distilling with a larger model using this data and labeled data”. Robustness is also improved.

0 replies, 1 likes

Tim Dettmers: Just for reference, the noisy student training paper:

0 replies, 1 likes

Piotr Czapla: Brilliant idea how to make a repeated teacher student learning working even if both are the same architecture. It seems generic enough to work for text, can’t wait to give it a try on multifit zeroshot.

0 replies, 1 likes

Fabien Da Silva: @owulveryck @arxiv - Rigging the Lottery: Making All Tickets Winners - Self-training with Noisy Student improves ImageNet classification - Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Input

1 replies, 0 likes

Brundage Bot: Self-training with Noisy Student improves ImageNet classification. Qizhe Xie, Eduard Hovy, Minh-Thang Luong, and Quoc V. Le

1 replies, 0 likes


Found on Nov 12 2019 at

PDF content of a computer science paper: Self-training with Noisy Student improves ImageNet classification