Papers of the day   All papers

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Comments

Jun 20 2019 Quoc Le

XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) arxiv: https://arxiv.org/abs/1906.08237 github (code + pretrained models): https://github.com/zihangdai/xlnet with Zhilin Yang, @ZihangDai, Yiming Yang, Jaime Carbonell, @rsalakhu https://t.co/JboOekUVPQ
22 replies, 1846 likes


Jun 21 2019 Guillaume Lample

If you want to train BERT from scratch in @PyTorch, you can check out our XLM repository! Our English model outperforms the original BERT on all GLUE tasks, although it's trained on the same data and without the next sentence prediction task https://github.com/facebookresearch/XLM @alex_conneau
3 replies, 669 likes


Jun 24 2019 Elliot Turner

Holy crap: It costs $245,000 to train the XLNet model (the one that's beating BERT on NLP tasks..512 TPU v3 chips * 2.5 days * $8 a TPU) - https://arxiv.org/abs/1906.08237 https://t.co/hvvB2C4oSN
20 replies, 649 likes


Jun 20 2019 Russ Salakhutdinov

XLNet: Generalized Autoregressive Pretraining for Language Understanding: outperforming BERT on 20 tasks (SQuAD, GLUE, sentiment analysis), while integrating ideas from Transformer-XL: arxiv: https://arxiv.org/abs/1906.08237 code + pretrained models: https://github.com/zihangdai/xlnet
2 replies, 509 likes


Jun 25 2019 Mark 🦑 Riedl

That is 4x the average salary in the US and 9.5x the poverty line.
13 replies, 296 likes


Jun 20 2019 Graham Neubig

There are a number of nice aspects to this method, but perhaps the nicest thing is that it's not named after a Sesame Street character 🙂
9 replies, 238 likes


Jun 20 2019 Mark Riedl 🚀 Mars (Moon)

RIP BERT The problem is naming models after Sesame Street characters is that it artificially adds significance to then models. And then they will be cast aside when something better comes along in a few months.
2 replies, 150 likes


Jun 20 2019 Language Technologies Institute | @CarnegieMellon

Today sees a major advance in the state of the art on English language understanding tasks by researchers at @LTIatCMU, @mldcmu, and @GoogleAI. The gains are due to a new permutation-based pre-training objective, and models that capture longer context than previous methods.
0 replies, 108 likes


Jun 20 2019 Nikos Pappas

XLNet, a new acronym to remember in #NLProc👇 Two key differences to BERT: - Learns with an objective which maximizes likelihood over all permutations of the factorization order. - Encodes with Transformer-XL which captures longer-term dependencies than vanilla Transformer.
2 replies, 77 likes


Jun 20 2019 Thang Luong

I thought SQuAD is solved with BERT, but XLNet team doesn't want to stop there:) Great results on SQuAD, GLUE, and RACE! @ZihangDai @quocleix https://t.co/PxzNxuHtAf
0 replies, 60 likes


Jun 25 2019 Sam Finlayson

This is a lot of $$$, and one could raise solid q's about the cost-benefit to society if ML research increasingly focuses on scale rather than more fundamental algo innovation However, if the price alone blows your mind + rustles your jimmies, you should talk to some biologists.
4 replies, 48 likes


Jul 01 2019 Rachael Tatman

Time to pick the next @kaggle reading group paper! Your options: - XLNet: Generalized Autoregressive Pretraining for NLU https://arxiv.org/pdf/1906.08237.pdf - Defending Against Neural Fake News (Grover) https://arxiv.org/abs/1905.12616 - EfficientNet: Model Scaling for CNNs https://arxiv.org/abs/1905.11946
5 replies, 44 likes


Jun 25 2019 Tim Hwang

the inherent structure of markets in artificial intelligence: oligopoly
1 replies, 42 likes


Jun 21 2019 ni sinha

New objective + more 10x data + tranf xl + more. Predict word given all permutations of words in context. Combines autoreg lms with auto encoding lms in a clever way. Very cleanly written paper!
1 replies, 33 likes


Jun 25 2019 James Bradbury

@eturner303 512 TPU chips is 128 TPU devices, or $61,440 for 2.5 days. The authors could also have meant 512 cores, which is 64 devices or $30,720.
2 replies, 32 likes


Jun 21 2019 Delip Rao

PSA for #NLProc folks. XLM is not XLNet (http://arxiv.org/abs/1906.08237) that was released couple days ago. They both beat BERT. You thought Sesame Street names were bad. Okay, now check out this super cool work 👇🏼
1 replies, 30 likes


Jun 20 2019 Machine Learning and NLP

XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) https://arxiv.org/pdf/1906.08237.pdf #NLProc
0 replies, 27 likes


Jun 20 2019 Bharath Ramsundar

This is a beautiful paper. A new language pretraining method that achieves compelling improvements over BERT. Large jumps on a number of benchmarks. Uses a clever permutation invariant autoregressive formulation plus Transformer-XL for handling long sequences
0 replies, 26 likes


Jun 20 2019 Djamé

I'm a bit disappointed they didn't title their paper "Bert is Dead: Behold Optimus Prime"
2 replies, 19 likes


Jun 20 2019 Christian Szegedy

Wow!
1 replies, 15 likes


Jun 20 2019 Braden Hancock

Some seriously impressive gains on popular benchmarks, with nice analysis. I'm sure the pretrained model will see a lot of use (even without a Sesame Street name). Hoping a Pytorch version is available soon for tinkering! (Fingers crossed that @Thom_Wolf works his magic again...)
1 replies, 14 likes


Jun 25 2019 Matthew Kenney

access to compute can't be decoupled from responsible development of ml systems.
0 replies, 14 likes


Jun 20 2019 Pranav Rajpurkar

Wow!
0 replies, 13 likes


Jun 20 2019 Mona Jalal @ CVPR 2019 #cvpr2019

Today sees a major advance in the state of the art on English language understanding tasks by researchers at @LTIatCMU @mldcmu and @GoogleAI The gains are due to a new permutation-based pre-training objective, and models that capture longer context than previous methods. #NLProc https://t.co/ftwOI2vwV9
1 replies, 13 likes


Jun 25 2019 Hannah Godofsky 💃🏼

I sometimes think computing has become the modern horse. Feudal societies were organized in large part around feeding and training horses. Today we farm electricity to feed computers. #machinelearning
1 replies, 10 likes


Jun 25 2019 Atul Butte

Luckily it costs a lot less to train a child how to understand language… HT @IAmSamFin
1 replies, 9 likes


Jun 20 2019 arXiv CS-CL

XLNet: Generalized Autoregressive Pretraining for Language Understanding http://arxiv.org/abs/1906.08237
0 replies, 9 likes


Jun 20 2019 Hugh Harvey

Congrats to the Google Brain team on what is most likely to be the most important advance in NLP this year, smashing previous state-of-the-art accuracy metrics. "In the future, we envision applications of XLNet to a wider set of tasks such as vision" https://arxiv.org/abs/1906.08237
0 replies, 7 likes


Jun 20 2019 Eclipse DL4J

Better than BERT! XLNet: Generalized Autoregressive Pretraining for Language Understanding https://arxiv.org/abs/1906.08237 #deeplearning
0 replies, 7 likes


Jun 22 2019 AUEB NLP Group

Next AUEB NLP Group meeting, Tue. **July 2**, 17:15-19:00: Discussion of "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237). Study the paper before the meeting. Central AUEB buildings, room A36. All welcome.
0 replies, 6 likes


Jun 20 2019 arXiv CS-CL

XLNet: Generalized Autoregressive Pretraining for Language Understanding http://arxiv.org/abs/1906.08237
0 replies, 5 likes


Jun 26 2019 BioDecoded

Google Brain’s XLNet bests BERT at 20 NLP tasks | VentureBeat https://venturebeat.com/2019/06/21/google-brains-xlnet-bests-bert-at-20-nlp-tasks/ … https://arxiv.org/abs/1906.08237 #DeepLearning #NLP https://t.co/4GPE7dkP8R
0 replies, 5 likes


Jun 20 2019 Sam Witteveen

This looks super impressive!!
0 replies, 4 likes


Jun 20 2019 Eugene Kharitonov

😱
2 replies, 3 likes


Jun 23 2019 Negru Adrian Eduard

Great news for #NLP enthusiasts. XLnet[1] is said to outperform Bert on certain nlp tasks according to this article [2]. #machinelearning [1] https://arxiv.org/abs/1906.08237 [2] https://medium.com/dair-ai/xlnet-outperforms-bert-on-several-nlp-tasks-9ec867bb563b
0 replies, 3 likes


Aug 25 2019 BioDecoded

Google Brain’s XLNet bests BERT at 20 NLP tasks | VentureBeat https://venturebeat.com/2019/06/21/google-brains-xlnet-bests-bert-at-20-nlp-tasks/ https://arxiv.org/abs/1906.08237 #DeepLearning #NLP https://t.co/g1roeJq9db
0 replies, 2 likes


Jun 25 2019 Víctor Peinado

And don't forget the CO2 footprint. We should be training these models on Mars. Let's fight against climate change on Earth while we help terraform the red planet
0 replies, 2 likes


Jun 20 2019 Min Sun

Latest breakthrough!
0 replies, 2 likes


Jun 25 2019 Neal Lathia

I wonder whether these huge training costs are offset by these models being open sourced (so that they can be fine-tuned and deployed at nearly no cost for 100s of problems).
1 replies, 2 likes


Aug 17 2019 Benjamin Singleton

XLNet: Generalized Autoregressive Pretraining for Language Understanding #DataScience #BigData https://arxiv.org/abs/1906.08237
0 replies, 2 likes


Jun 20 2019 Sean Goldberg

Kids these days and their newfangled Sesame Street characters like XLNet...
0 replies, 2 likes


Jun 26 2019 Delip Rao

In the long run, withholding large parameter models on public datasets rarely prevents well resourced bad actors from weaponizing. Only guaranteed outcomes are stifling of innovation, making the rich richer, & environmental impacts from forced replication. https://twitter.com/eturner303/status/1143174828804857856?s=21
0 replies, 2 likes


Jun 26 2019 Navin Kabra

For all those who think machine learning is easy/cheap, please note the costs of training a deep learning network
0 replies, 2 likes


Jun 20 2019 Aakash Deep

@besanson BERT and #NLP
0 replies, 1 likes


Jun 20 2019 Anna Rogers

It's official: #XLNet (https://arxiv.org/abs/1906.08237) is a larger-than-BERT model said to do better-than-BERT. I can't reproduce it in my lab. Can you? Will you, when you could train an XXXLNet instead? Please share the poll. No offense to the authors, but the trend is worrying.
0 replies, 1 likes


Aug 06 2019 Masoud Hoveidar

I'll be facilitating an interesting talk on the XLNet paper today. So come and join us if you are in Toronto, or join the live streaming online #AISC #XLNet @AISC_TO: Paper: https://arxiv.org/pdf/1906.08237.pdf https://www.eventbrite.ca/e/xlnet-generalized-autoregressive-pretraining-for-language-understanding-tickets-67232453077
0 replies, 1 likes


Jun 28 2019 Flávio Clésio

Holy Jesus!
0 replies, 1 likes


Jun 20 2019 Jeff Dalton

Great to see continued progress on pretraining methods that surpass BERT. But will it catch on without a sesame street reference?
0 replies, 1 likes


Jun 25 2019 Camilo Vasquez

Wowwwww #NLProc #DeepLearning
0 replies, 1 likes


Jun 20 2019 /r/ML Popular

[R] XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE,... https://arxiv.org/abs/1906.08237
0 replies, 1 likes


Jun 25 2019 zara tustra

it costs two hundred thousand dollars to teach this artificial intelligence for twelve seconds
0 replies, 1 likes


Aug 15 2019 Dogydev

Hyperparameter tuning using SHERPA (https://openreview.net/forum?id=HklSUMyJcQ) on XLNet (https://arxiv.org/abs/1906.08237) https://t.co/LBvv9DgxA8
0 replies, 1 likes


Jun 20 2019 cs.LG Papers

XLNet: Generalized Autoregressive Pretraining for Language Understanding. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le http://arxiv.org/abs/1906.08237
1 replies, 0 likes


Jun 26 2019 Albert Vilella

The @GCPcloud #TPUv3 was used for #XLNet which comes very close to human ceiling performance of reading comprehension of the #RACE dataset. An example of #Hardware #Acceleration applied to #ML . Can #TPUv3 be applied to #Genomics ? http://bit.ly/acceleromics https://t.co/UiYKe6npm3
1 replies, 0 likes


Content