Papers of the day   All papers

VirTex: Learning Visual Representations from Textual Annotations

Comments

Justin Johnson: Our new paper (w/@kdexd) argues that "language is all you need" for good visual features: we train CNN+Transformer *from scratch* on ~100k images+captions from COCO, transfer the CNN to 6 downstream vision tasks, and match/exceed ImageNet features despite using 10x fewer images!

14 replies, 1151 likes


Karan Desai: Introducing "VirTex": a pretraining approach to learn visual features via language using fewer images. Pretrain: CNN+Transformer from scratch on COCO Captions. Transfer CNN: Results on 6 vision tasks match/exceed ImageNet pretraining (10x size wrt COCO)! https://arxiv.org/abs/2006.06666 https://t.co/WnbLkktE1C

13 replies, 677 likes


Sayak Paul: Here's a list of my favorite recent papers on transfer learning for vision: - BigTransfer: https://arxiv.org/abs/1912.11370 - VirTex: https://arxiv.org/abs/2006.06666 - SimCLRv2: https://arxiv.org/abs/2006.10029 - Self-training: https://arxiv.org/abs/2006.06882 Would love to see a T5-like paper for vision.

2 replies, 217 likes


Yannic Kilcher: Also, the Arxiv number of this one is funky 😄 https://arxiv.org/abs/2006.06666

1 replies, 21 likes


Lucas Goulart Vazquez: This is truly amazing. Using language to learn vision, amazing!

0 replies, 18 likes


Sayak Paul: VirTex: Learning Visual Representations from Textual Annotations by @kdexd & @jcjohnss. Paper: https://arxiv.org/abs/2006.06666 Explanation by @ykilcher: https://www.youtube.com/watch?v=ZfDZRX3WiJg

0 replies, 6 likes


Stanislav Frolov: Captions carry rich semantic information which can be used to learn visual representations for downstream tasks using fewer images. https://t.co/1Pm5elA508

0 replies, 5 likes


MichiganAI: Michigan AI's Karan Desai (@kdexd) and Justin Johnson (@jcjohnss) introduce "VirTex": a pretraining approach to learn visual features via language using fewer images. https://arxiv.org/abs/2006.06666 https://youtu.be/ZfDZRX3WiJg

1 replies, 5 likes


Tarun: Attended a very fun paper discussion session organized by @omarsar0 discussing https://arxiv.org/abs/2006.06666 . Very interesting vision paper that uses image captioning to train a visual model instead of the usual classification task.

1 replies, 3 likes


Ben Mann: 5 friends tried to draw a bike from memory after reading https://arxiv.org/abs/2006.06666 Mine is the top left. https://t.co/jxuIje58GY

0 replies, 2 likes


Anirudh Dagar: Clean code and docs like never before. Setting high standards. @kdexd @jcjohnss

0 replies, 2 likes


arXiv CS-CL: VirTex: Learning Visual Representations from Textual Annotations http://arxiv.org/abs/2006.06666

0 replies, 1 likes


Andrey Lukyanenko: VirTex: Learning Visual Representations from Textual Annotations Paper: https://arxiv.org/abs/2006.06666 Code: https://github.com/kdexd/virtex Site: https://kdexd.github.io/virtex/ https://t.co/w4TX9qsl3U

1 replies, 1 likes


Yash Kant: Checkout @kdexd's new work! The code and docs are the prettiest I've seen! 🔥

0 replies, 1 likes


Content

Found on Jun 12 2020 at https://arxiv.org/pdf/2006.06666.pdf

PDF content of a computer science paper: VirTex: Learning Visual Representations from Textual Annotations