Papers of the day   All papers

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Comments

Quoc Le: XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) arxiv: https://arxiv.org/abs/1906.08237 github (code + pretrained models): https://github.com/zihangdai/xlnet with Zhilin Yang, @ZihangDai, Yiming Yang, Jaime Carbonell, @rsalakhu https://t.co/JboOekUVPQ

22 replies, 1851 likes


Guillaume Lample: If you want to train BERT from scratch in @PyTorch, you can check out our XLM repository! Our English model outperforms the original BERT on all GLUE tasks, although it's trained on the same data and without the next sentence prediction task https://github.com/facebookresearch/XLM @alex_conneau

3 replies, 669 likes


Elliot Turner: Holy crap: It costs $245,000 to train the XLNet model (the one that's beating BERT on NLP tasks..512 TPU v3 chips * 2.5 days * $8 a TPU) - https://arxiv.org/abs/1906.08237 https://t.co/hvvB2C4oSN

22 replies, 650 likes


Russ Salakhutdinov: XLNet: Generalized Autoregressive Pretraining for Language Understanding: outperforming BERT on 20 tasks (SQuAD, GLUE, sentiment analysis), while integrating ideas from Transformer-XL: arxiv: https://arxiv.org/abs/1906.08237 code + pretrained models: https://github.com/zihangdai/xlnet

2 replies, 509 likes


Mark 🦑 Riedl: That is 4x the average salary in the US and 9.5x the poverty line.

13 replies, 296 likes


Graham Neubig: There are a number of nice aspects to this method, but perhaps the nicest thing is that it's not named after a Sesame Street character 🙂

9 replies, 238 likes


Mark Riedl 🚀 Mars (Moon): RIP BERT The problem is naming models after Sesame Street characters is that it artificially adds significance to then models. And then they will be cast aside when something better comes along in a few months.

2 replies, 150 likes


Language Technologies Institute | @CarnegieMellon: Today sees a major advance in the state of the art on English language understanding tasks by researchers at @LTIatCMU, @mldcmu, and @GoogleAI. The gains are due to a new permutation-based pre-training objective, and models that capture longer context than previous methods.

0 replies, 108 likes


Nikos Pappas: XLNet, a new acronym to remember in #NLProc👇 Two key differences to BERT: - Learns with an objective which maximizes likelihood over all permutations of the factorization order. - Encodes with Transformer-XL which captures longer-term dependencies than vanilla Transformer.

2 replies, 77 likes


Thang Luong: I thought SQuAD is solved with BERT, but XLNet team doesn't want to stop there:) Great results on SQuAD, GLUE, and RACE! @ZihangDai @quocleix https://t.co/PxzNxuHtAf

0 replies, 60 likes


Sam Finlayson: This is a lot of $$$, and one could raise solid q's about the cost-benefit to society if ML research increasingly focuses on scale rather than more fundamental algo innovation However, if the price alone blows your mind + rustles your jimmies, you should talk to some biologists.

4 replies, 48 likes


Rachael Tatman: Time to pick the next @kaggle reading group paper! Your options: - XLNet: Generalized Autoregressive Pretraining for NLU https://arxiv.org/pdf/1906.08237.pdf - Defending Against Neural Fake News (Grover) https://arxiv.org/abs/1905.12616 - EfficientNet: Model Scaling for CNNs https://arxiv.org/abs/1905.11946

5 replies, 44 likes


Tim Hwang: the inherent structure of markets in artificial intelligence: oligopoly

1 replies, 42 likes


ni sinha: New objective + more 10x data + tranf xl + more. Predict word given all permutations of words in context. Combines autoreg lms with auto encoding lms in a clever way. Very cleanly written paper!

1 replies, 33 likes


James Bradbury: @eturner303 512 TPU chips is 128 TPU devices, or $61,440 for 2.5 days. The authors could also have meant 512 cores, which is 64 devices or $30,720.

2 replies, 32 likes


Delip Rao: PSA for #NLProc folks. XLM is not XLNet (http://arxiv.org/abs/1906.08237) that was released couple days ago. They both beat BERT. You thought Sesame Street names were bad. Okay, now check out this super cool work 👇🏼

1 replies, 30 likes


Machine Learning and NLP: XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) https://arxiv.org/pdf/1906.08237.pdf #NLProc

0 replies, 27 likes


Bharath Ramsundar: This is a beautiful paper. A new language pretraining method that achieves compelling improvements over BERT. Large jumps on a number of benchmarks. Uses a clever permutation invariant autoregressive formulation plus Transformer-XL for handling long sequences

0 replies, 26 likes


Djamé: I'm a bit disappointed they didn't title their paper "Bert is Dead: Behold Optimus Prime"

2 replies, 19 likes


Crypto Magix: In few months everyone can rent decentralized & cheap processing #AI power in @MatrixAINetwork (GPU powered) #blockchain $MAN

0 replies, 19 likes


Christian Szegedy: Wow!

1 replies, 15 likes


Matthew Kenney: access to compute can't be decoupled from responsible development of ml systems.

0 replies, 14 likes


Braden Hancock: Some seriously impressive gains on popular benchmarks, with nice analysis. I'm sure the pretrained model will see a lot of use (even without a Sesame Street name). Hoping a Pytorch version is available soon for tinkering! (Fingers crossed that @Thom_Wolf works his magic again...)

1 replies, 14 likes


Steadydee: At $245,000 to train an an AI model in 2.5 days, cloud computing costs are out of control! 👀 Demand for blockchain will explode when all those distributed GPUs are diverted to AI and not just wasteful mining. Coming February 2020 #MATRIXAI $MAN https://twitter.com/eturner303/status/1143174828804857856?s=20

1 replies, 13 likes


Mona Jalal @ CVPR 2019 #cvpr2019: Today sees a major advance in the state of the art on English language understanding tasks by researchers at @LTIatCMU @mldcmu and @GoogleAI The gains are due to a new permutation-based pre-training objective, and models that capture longer context than previous methods. #NLProc https://t.co/ftwOI2vwV9

1 replies, 13 likes


Pranav Rajpurkar: Wow!

0 replies, 13 likes


Hannah Godofsky 💃🏼: I sometimes think computing has become the modern horse. Feudal societies were organized in large part around feeding and training horses. Today we farm electricity to feed computers. #machinelearning

1 replies, 10 likes


arXiv CS-CL: XLNet: Generalized Autoregressive Pretraining for Language Understanding http://arxiv.org/abs/1906.08237

0 replies, 9 likes


Atul Butte: Luckily it costs a lot less to train a child how to understand language… HT @IAmSamFin

1 replies, 9 likes


Carlo Lepelaars: Finally got around to reading up on some recent NLP papers. Currently reading: ALBERT: https://arxiv.org/pdf/1909.11942.pdf RoBERTa: https://arxiv.org/pdf/1907.11692.pdf XLNet: https://arxiv.org/pdf/1906.08237.pdf BERTje (Dutch BERT model): https://arxiv.org/pdf/1912.09582.pdf Do you have any other NLP paper recommendations?

1 replies, 7 likes


Hugh Harvey: Congrats to the Google Brain team on what is most likely to be the most important advance in NLP this year, smashing previous state-of-the-art accuracy metrics. "In the future, we envision applications of XLNet to a wider set of tasks such as vision" https://arxiv.org/abs/1906.08237

0 replies, 7 likes


Eclipse DL4J: Better than BERT! XLNet: Generalized Autoregressive Pretraining for Language Understanding https://arxiv.org/abs/1906.08237 #deeplearning

0 replies, 7 likes


AUEB NLP Group: Next AUEB NLP Group meeting, Tue. **July 2**, 17:15-19:00: Discussion of "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237). Study the paper before the meeting. Central AUEB buildings, room A36. All welcome.

0 replies, 6 likes


BioDecoded: Google Brain’s XLNet bests BERT at 20 NLP tasks | VentureBeat https://venturebeat.com/2019/06/21/google-brains-xlnet-bests-bert-at-20-nlp-tasks/ … https://arxiv.org/abs/1906.08237 #DeepLearning #NLP https://t.co/4GPE7dkP8R

0 replies, 5 likes


arXiv CS-CL: XLNet: Generalized Autoregressive Pretraining for Language Understanding http://arxiv.org/abs/1906.08237

0 replies, 5 likes


Sam Witteveen: This looks super impressive!!

0 replies, 4 likes


小猫遊りょう(たかにゃし・りょう): Meta-Learning Update Rules for Unsupervised Representation Learning https://arxiv.org/abs/1804.00222 On the Variance of the Adaptive Learning Rate and Beyond https://arxiv.org/abs/1908.03265v1 XLNet: Generalized Autoregressive Pretraining for Language Understanding https://arxiv.org/abs/1906.08237

1 replies, 3 likes


Eugene Kharitonov: 😱

2 replies, 3 likes


Negru Adrian Eduard: Great news for #NLP enthusiasts. XLnet[1] is said to outperform Bert on certain nlp tasks according to this article [2]. #machinelearning [1] https://arxiv.org/abs/1906.08237 [2] https://medium.com/dair-ai/xlnet-outperforms-bert-on-several-nlp-tasks-9ec867bb563b

0 replies, 3 likes


Neal Lathia: I wonder whether these huge training costs are offset by these models being open sourced (so that they can be fine-tuned and deployed at nearly no cost for 100s of problems).

1 replies, 2 likes


Navin Kabra: For all those who think machine learning is easy/cheap, please note the costs of training a deep learning network

0 replies, 2 likes


Sean Goldberg: Kids these days and their newfangled Sesame Street characters like XLNet...

0 replies, 2 likes


Víctor Peinado: And don't forget the CO2 footprint. We should be training these models on Mars. Let's fight against climate change on Earth while we help terraform the red planet

0 replies, 2 likes


Min Sun: Latest breakthrough!

0 replies, 2 likes


Delip Rao: In the long run, withholding large parameter models on public datasets rarely prevents well resourced bad actors from weaponizing. Only guaranteed outcomes are stifling of innovation, making the rich richer, & environmental impacts from forced replication. https://twitter.com/eturner303/status/1143174828804857856?s=21

0 replies, 2 likes


Benjamin Singleton: XLNet: Generalized Autoregressive Pretraining for Language Understanding #DataScience #BigData https://arxiv.org/abs/1906.08237

0 replies, 2 likes


BioDecoded: Google Brain’s XLNet bests BERT at 20 NLP tasks | VentureBeat https://venturebeat.com/2019/06/21/google-brains-xlnet-bests-bert-at-20-nlp-tasks/ https://arxiv.org/abs/1906.08237 #DeepLearning #NLP https://t.co/g1roeJq9db

0 replies, 2 likes


/r/ML Popular: [R] XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE,... https://arxiv.org/abs/1906.08237

0 replies, 1 likes


Jeff Dalton: Great to see continued progress on pretraining methods that surpass BERT. But will it catch on without a sesame street reference?

0 replies, 1 likes


arXiv CS-CL: XLNet: Generalized Autoregressive Pretraining for Language Understanding http://arxiv.org/abs/1906.08237

0 replies, 1 likes


Masoud Hoveidar: I'll be facilitating an interesting talk on the XLNet paper today. So come and join us if you are in Toronto, or join the live streaming online #AISC #XLNet @AISC_TO: Paper: https://arxiv.org/pdf/1906.08237.pdf https://www.eventbrite.ca/e/xlnet-generalized-autoregressive-pretraining-for-language-understanding-tickets-67232453077

0 replies, 1 likes


Camilo Vasquez: Wowwwww #NLProc #DeepLearning

0 replies, 1 likes


Anna Rogers: It's official: #XLNet (https://arxiv.org/abs/1906.08237) is a larger-than-BERT model said to do better-than-BERT. I can't reproduce it in my lab. Can you? Will you, when you could train an XXXLNet instead? Please share the poll. No offense to the authors, but the trend is worrying.

0 replies, 1 likes


zara tustra: it costs two hundred thousand dollars to teach this artificial intelligence for twelve seconds

0 replies, 1 likes


Aakash Deep: @besanson BERT and #NLP

0 replies, 1 likes


Dogydev: Hyperparameter tuning using SHERPA (https://openreview.net/forum?id=HklSUMyJcQ) on XLNet (https://arxiv.org/abs/1906.08237) https://t.co/LBvv9DgxA8

0 replies, 1 likes


Flávio Clésio: Holy Jesus!

0 replies, 1 likes


cs.LG Papers: XLNet: Generalized Autoregressive Pretraining for Language Understanding. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le http://arxiv.org/abs/1906.08237

1 replies, 0 likes


Albert Vilella: The @GCPcloud #TPUv3 was used for #XLNet which comes very close to human ceiling performance of reading comprehension of the #RACE dataset. An example of #Hardware #Acceleration applied to #ML . Can #TPUv3 be applied to #Genomics ? http://bit.ly/acceleromics https://t.co/UiYKe6npm3

1 replies, 0 likes


Guillermo Valle: has anyone tried a generalized autoregressive model like XLNet (https://arxiv.org/abs/1906.08237) where the generation order is chosen like in wave-function collapse algorithm (by chosing the min-entropy point in the sequence)?

1 replies, 0 likes


Content

Found on Jun 20 2019 at https://arxiv.org/pdf/1906.08237.pdf

PDF content of a computer science paper: XLNet: Generalized Autoregressive Pretraining for Language Understanding