Unsupervised Data Augmentation


Apr 30 2019 Quoc Le

Data augmentation is often associated with supervised learning. We find *unsupervised* data augmentation works better. It combines well with transfer learning (e.g. BERT) and improves everything when datasets have a small number of labeled examples. Link:
Apr 30 2019 Thang Luong

Introducing UDA, our new work on "Unsupervised data augmentation" for semi-supervised learning (SSL) with Qizhe Xie, Zihang Dai, Eduard Hovy, & @quocleix. SOTA results on IMDB (with just 20 labeled examples!), SSL Cifar10 & SVHN (30% error reduction)!
May 18 2019 Quoc Le

Links to the mentioned papers. MixMatch: Unsupervised Data Augmentation:
May 19 2019 Quoc Le

To add to Vincent's point above, new findings also include: 1. The method is general (works well for images & texts). 2. The method works well on top of transfer learning (e.g., BERT). You can find these results in Unsupervised Data Augmentation paper:
May 06 2019 Mihail Eric

Yum! Unsupervised data augmentation that works from @GoogleAI @QizheXie @quocleix. New state-of-the-art on various language and vision tasks:
Jul 11 2019 Thang Luong

These plots (also included in the updated version of our UDA paper with a lot more results & details) illustrate very well Vincent's article on the quiet revolution of semi-supervised learning!
Apr 30 2019 Arjun (Raj) Manrai

Wow: "on IMDb, UDA with 20 labeled examples outperforms the state-of-the-art model trained on 1250x more labeled data"
May 06 2019 Daisuke Okanohara

In semi-supervised learning, VAT adds adversarial noise to unsupervised data and makes its prediction distribution matches the original distribution. UDA instead applies data augmentation methods and gradually increases the signal from the supervised data
Jun 27 2019 Quoc Le

@ivan_bezdomny In NLP, there is backtranslation method that works quite well as a data augmentation method. You can check out its use in UDA: Link to code:
May 01 2019 Saleh Elmohamed

Really nice work by Quoc Le and colleagues at Google & CMU on unsupervised data augmentation. Highly recommend checking out their latest paper at the arXiv.
