Papers of the day   All papers

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision


Hao Tan: *Vokenization*: a visually-supervised language model attempt in our #emnlp2020 paper: (w. @mohitban47) To improve language pre-training, we extrapolate multimodal alignments to lang-only data by contextually mapping tokens to related images ("vokens") 1/4

6 replies, 368 likes

Thomas Wolf: This is a really cool piece of work! The first time I see an {image+text} BERT-model outperform BERT on common text-only tasks.

0 replies, 247 likes

Mohit Bansal (@🏡): Thanks @_KarenHao for this fun article in MIT @TechReview (with cats😺) covering @HaoTan5's "Vokenization" work at @UNC, upcoming at #emnlp2020! (also features kind words from the awesome @Thom_Wolf/@huggingface🤗) Paper: Try it:

2 replies, 68 likes

Mohit Bansal (@🏡): "Vokens" = Visually-grounded-tokens (contextual) to imprv lang-pretraining & engl NLU tasks (imp divergence/grounding ratio issues, extrapolates frm small dataset)! pdf: Full code: ➡️Hao is on job market🙂:

1 replies, 56 likes

Thomas Wolf: Karen wrote a nice article in MIT Tech Review on the significance of the recent multi modal work of Mohit and Hao! Check it out 😺

0 replies, 17 likes

Gabriel Ilharco: Whoa, this is really cool! Text-only models often outperform text+vision models in text-only tasks, given the statistical discrepancies in the language used in these domains. "Vokenization" is a neat way to get some grounded supervision without paying the domain shit price

0 replies, 15 likes

Jack Hessel: @HaoTan5 and @mohitban47's paper on "vokenization" is *very* cool! They present some of the first concrete evidence that a multimodal vision+language model can outperform a text-only model on /language only/ tasks (e.g., GLUE, SQuAD). Exciting!! 🤩

1 replies, 9 likes

Shiyue Zhang: Check out this pretty cool *Vokenization* paper from our group! Introducing visual grounding information to language model pretraining!

0 replies, 9 likes

Mohit Bansal (@🏡): And here is the original summary thread by Hao for more info -->

0 replies, 5 likes

HotComputerScience: Most popular computer science paper of the day: "Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision"

0 replies, 2 likes

Reza: #paper #NLP #LanguageModel #MachineLearning

0 replies, 2 likes

Alexander Kruel: This could lead to the next big breakthrough in common sense AI (Paper:

0 replies, 2 likes

akira: Inspired by the fact that humans use not only textual information but also visual information when learning language, they propose a learning method using the "vokenizer “ that generates images associated with tokens.

0 replies, 1 likes Here's another article/paper that fits perfectly into this debate (thanks @cathalhoran for this recommendation🙏). @HaoTan5 @mohitban47 write about training models with text and images to overcome these limitations.

0 replies, 1 likes

cathal horan: In my post on @neptune_ai I talk about some of the limits of training text only models like #GPT3. @HaoTan5 & @mohitban47 fascinating paper shows how to combine text and images to potentially address these issues #MachineLearning

1 replies, 0 likes


Found on Oct 15 2020 at

PDF content of a computer science paper: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision