Papers of the day   All papers

Unsupervised Data Augmentation

Comments

Quoc Le: Data augmentation is often associated with supervised learning. We find *unsupervised* data augmentation works better. It combines well with transfer learning (e.g. BERT) and improves everything when datasets have a small number of labeled examples. Link: http://arxiv.org/abs/1904.12848

3 replies, 668 likes


Thang Luong: Introducing UDA, our new work on "Unsupervised data augmentation" for semi-supervised learning (SSL) with Qizhe Xie, Zihang Dai, Eduard Hovy, & @quocleix. SOTA results on IMDB (with just 20 labeled examples!), SSL Cifar10 & SVHN (30% error reduction)! https://arxiv.org/abs/1904.12848 https://t.co/rBf2U9NQL0

3 replies, 444 likes


Thang Luong: Nice recent tutorial on semi-supervised learning, covering our recent works on UDA (https://arxiv.org/abs/1904.12848) and #NoisyStudent (https://arxiv.org/abs/1911.04252). It also highlights VAT, Pi-Model, MeanTeacher, and MixMatch. Slides: https://drive.google.com/file/d/1awKePvIWFXTBE2_dPW6jsOr-HwVrUAdA/view

1 replies, 134 likes


Thang Luong: Nice additional gains achieved by MPL (Meta Pseudo Labels, https://arxiv.org/abs/2003.10580) on top of UDA (Unsupervised Data Augmentation, https://arxiv.org/abs/1904.12848) on low-data regimes! https://t.co/e2HTx4H1K1

0 replies, 98 likes


Quoc Le: Links to the mentioned papers. MixMatch: https://arxiv.org/abs/1905.02249 Unsupervised Data Augmentation: https://arxiv.org/abs/1904.12848

1 replies, 82 likes


Quoc Le: To add to Vincent's point above, new findings also include: 1. The method is general (works well for images & texts). 2. The method works well on top of transfer learning (e.g., BERT). You can find these results in Unsupervised Data Augmentation paper: https://arxiv.org/abs/1904.12848

0 replies, 67 likes


Quoc Le: This work continues our efforts on semi-supervised learning such as UDA: https://arxiv.org/abs/1904.12848 MixMatch: https://arxiv.org/abs/1905.02249 FixMatch: https://arxiv.org/abs/2001.07685 Noisy Student: https://arxiv.org/abs/1911.04252 etc. Joint work with @hieupham789 @QizheXie @ZihangDai

1 replies, 64 likes


Mihail Eric: Yum! Unsupervised data augmentation that works from @GoogleAI @QizheXie @quocleix. New state-of-the-art on various language and vision tasks: https://arxiv.org/pdf/1904.12848.pdf

0 replies, 54 likes


Thang Luong: Our UDA work (https://arxiv.org/abs/1904.12848) proposes the use of strong augmentation (RandAugment) which subsequent works (FixMatch, NoisyStudent) follow. UDA uses soft pseudo-labels whereas FixMatch uses hard ones after "weak" augmentation in consistency training. https://t.co/z7nobsdUws

1 replies, 38 likes


Sayak Paul: Unsupervised Data Augmentation for Consistency Training (https://arxiv.org/pdf/1904.12848.pdf) also knows as UDA is such an important paper in the area of self-supervised learning. It systematically studies how stronger data augmentation ops benefit a model in learning good representations.

0 replies, 30 likes


Arjun (Raj) Manrai: Wow: "on IMDb, UDA with 20 labeled examples outperforms the state-of-the-art model trained on 1250x more labeled data" https://arxiv.org/abs/1904.12848

0 replies, 8 likes


Thang Luong: These plots (also included in the updated version of our UDA paper https://arxiv.org/abs/1904.12848 with a lot more results & details) illustrate very well Vincent's article on the quiet revolution of semi-supervised learning! https://towardsdatascience.com/the-quiet-semi-supervised-revolution-edec1e9ad8c

2 replies, 8 likes


arXiv CS-CV: Unsupervised Data Augmentation for Consistency Training http://arxiv.org/abs/1904.12848

0 replies, 7 likes


Daisuke Okanohara: In semi-supervised learning, VAT adds adversarial noise to unsupervised data and makes its prediction distribution matches the original distribution. UDA instead applies data augmentation methods and gradually increases the signal from the supervised data https://arxiv.org/abs/1904.12848

0 replies, 5 likes


arXiv CS-CL: Unsupervised Data Augmentation for Consistency Training http://arxiv.org/abs/1904.12848

0 replies, 4 likes


arXiv in review: #ICLR2020 Unsupervised Data Augmentation for Consistency Training. (arXiv:1904.12848v5 [cs\.LG] UPDATED) http://arxiv.org/abs/1904.12848

0 replies, 4 likes


Quoc Le: @ivan_bezdomny In NLP, there is backtranslation method that works quite well as a data augmentation method. You can check out its use in UDA: https://arxiv.org/abs/1904.12848 Link to code: https://github.com/google-research/uda#run-back-translation-data-augmentation-for-your-dataset

1 replies, 3 likes


BioDecoded: Inference of clonal selection in cancer populations using single-cell sequencing data | Bioinformatics https://ai.googleblog.com/2019/07/advancing-semi-supervised-learning-with.html https://arxiv.org/pdf/1904.12848.pdf #MachineLearning https://t.co/AWt9hBaOGr

1 replies, 3 likes


BioDecoded: Advancing Semi-supervised Learning with Unsupervised Data Augmentation | Google AI Blog https://ai.googleblog.com/2019/07/advancing-semi-supervised-learning-with.html https://arxiv.org/abs/1904.12848 #MachineLearning https://t.co/qzMNH0r4eq

0 replies, 3 likes


arXiv CS-CV: Unsupervised Data Augmentation for Consistency Training http://arxiv.org/abs/1904.12848

0 replies, 2 likes


arXiv CS-CL: Unsupervised Data Augmentation for Consistency Training http://arxiv.org/abs/1904.12848

0 replies, 2 likes


AK: #UDA or unsupervised data augmentations new technique from @google to generate synthetic data for #neuralnetworks #AI #machinelearning https://arxiv.org/pdf/1904.12848.pdf

0 replies, 1 likes


Saleh Elmohamed: Really nice work by Quoc Le and colleagues at Google & CMU on unsupervised data augmentation. Highly recommend checking out their latest paper at the arXiv.

0 replies, 1 likes


Cherrypick: UNSUPERVISED DATA AUGMENTATION (UDA) https://arxiv.org/pdf/1904.12848.pdf https://youtu.be/cqjcJ7XqGkA trying to use for stock market sentiment analysis (pytorch and BERT)

1 replies, 0 likes


Content

Found on Apr 30 2019 at https://arxiv.org/pdf/1904.12848.pdf

PDF content of a computer science paper: Unsupervised Data Augmentation