Papers of the day   All papers

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Comments

Hao Tan: *Vokenization*: a visually-supervised language model attempt in our #emnlp2020 paper: https://arxiv.org/abs/2010.06775 (w. @mohitban47) To improve language pre-training, we extrapolate multimodal alignments to lang-only data by contextually mapping tokens to related images ("vokens") 1/4 https://t.co/wuXt1K58BH

6 replies, 368 likes


Thomas Wolf: This is a really cool piece of work! The first time I see an {image+text} BERT-model outperform BERT on common text-only tasks.

0 replies, 247 likes


Mohit Bansal (@🏡): Thanks @_KarenHao for this fun article in MIT @TechReview (with cats😺) covering @HaoTan5's "Vokenization" work at @UNC, upcoming at #emnlp2020! (also features kind words from the awesome @Thom_Wolf/@huggingface🤗) Paper: https://arxiv.org/abs/2010.06775 Try it: https://github.com/airsplay/vokenization

2 replies, 68 likes


Mohit Bansal (@🏡): "Vokens" = Visually-grounded-tokens (contextual) to imprv lang-pretraining & engl NLU tasks (imp divergence/grounding ratio issues, extrapolates frm small dataset)! pdf: https://arxiv.org/abs/2010.06775 Full code: https://github.com/airsplay/vokenization ➡️Hao is on job market🙂: https://www.cs.unc.edu/~airsplay/

1 replies, 56 likes


Thomas Wolf: Karen wrote a nice article in MIT Tech Review on the significance of the recent multi modal work of Mohit and Hao! Check it out 😺

0 replies, 17 likes


Gabriel Ilharco: Whoa, this is really cool! Text-only models often outperform text+vision models in text-only tasks, given the statistical discrepancies in the language used in these domains. "Vokenization" is a neat way to get some grounded supervision without paying the domain shit price

0 replies, 15 likes


Jack Hessel: @HaoTan5 and @mohitban47's paper on "vokenization" is *very* cool! They present some of the first concrete evidence that a multimodal vision+language model can outperform a text-only model on /language only/ tasks (e.g., GLUE, SQuAD). Exciting!! 🤩 https://arxiv.org/abs/2010.06775

1 replies, 9 likes


Shiyue Zhang: Check out this pretty cool *Vokenization* paper from our group! Introducing visual grounding information to language model pretraining!

0 replies, 9 likes


Mohit Bansal (@🏡): And here is the original summary thread by Hao for more info --> https://twitter.com/HaoTan5/status/1316785618278666241

0 replies, 5 likes


HotComputerScience: Most popular computer science paper of the day: "Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision" https://hotcomputerscience.com/paper/vokenization-improving-language-understanding-with-contextualized-visual-grounded-supervision https://twitter.com/HaoTan5/status/1316785618278666241

0 replies, 2 likes


Reza: #paper #NLP #LanguageModel #MachineLearning

0 replies, 2 likes


Alexander Kruel: This could lead to the next big breakthrough in common sense AI (Paper: https://arxiv.org/abs/2010.06775) https://www.technologyreview.com/2020/11/06/1011726/ai-natural-language-processing-computer-vision/

0 replies, 2 likes


akira: https://arxiv.org/abs/2010.06775 Inspired by the fact that humans use not only textual information but also visual information when learning language, they propose a learning method using the "vokenizer “ that generates images associated with tokens. https://t.co/HRytyQFFRP

0 replies, 1 likes


neptune.ai: Here's another article/paper that fits perfectly into this debate (thanks @cathalhoran for this recommendation🙏). @HaoTan5 @mohitban47 write about training models with text and images to overcome these limitations. https://arxiv.org/pdf/2010.06775.pdf

0 replies, 1 likes


cathal horan: In my post https://neptune.ai/blog/ai-limits-can-deep-learning-models-like-bert-ever-understand-language on @neptune_ai I talk about some of the limits of training text only models like #GPT3. @HaoTan5 & @mohitban47 fascinating paper shows how to combine text and images to potentially address these issues https://arxiv.org/abs/2010.06775 #MachineLearning

1 replies, 0 likes


Content

Found on Oct 15 2020 at https://arxiv.org/pdf/2010.06775.pdf

PDF content of a computer science paper: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision