Papers of the day   All papers

Single Headed Attention RNN: Stop Thinking With Your Head

Comments

Nov 27 2019 Smerity

Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed. https://arxiv.org/abs/1911.11423 https://t.co/RN5TPZ3xWH
51 replies, 1860 likes


Nov 27 2019 hardmaru

“The author's goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTM and get near SOTA” 🔥
1 replies, 313 likes


Nov 27 2019 Thomas Lahore

Single Headed Attention RNN: Stop Thinking With Your Head "The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient." "Take that Sesame Street." paper: https://arxiv.org/abs/1911.11423 code: https://github.com/smerity/sha-rnn https://t.co/dT6v2DounV
3 replies, 238 likes


Nov 28 2019 Jason Antic

I saw this paper and immediately dropped what I was doing to read it and got hooked immediately. It’s super interesting, insightful, and FUNNY. I also love his discipline of sticking with a humble desktop to run his experiments- I strongly believe in this.
1 replies, 135 likes


Nov 27 2019 Sara Hooker

Now here is an abstract with flair: "This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer... Take that Sesame Street."
0 replies, 111 likes


Nov 27 2019 Colin Raffel

Entertaining exposition aside, I think the best quote from this paper is "there are usually far more efficient ways to achieve something once we know it’s possible."
0 replies, 103 likes


Nov 27 2019 Miles Brundage

"Single Headed Attention RNN: Stop Thinking With Your Head," @Smerity: https://arxiv.org/abs/1911.11423
3 replies, 82 likes


Nov 30 2019 Mark O. Riedl

This paper speculates on what might have happened if we didn’t discover Transformer architectures and hammered on RNNs instead. We might be in more or less the same place. Also @Smerity is very funny and you should read it.
0 replies, 80 likes


Nov 27 2019 Aidan Gomez 😷

A fantastic paper highlighting the neglect RNN-based LMs have seen over the past couple of years. I think it's incredibly important to maintain diversity in the currents of research, and I find it unproductive for a field to collapse to iteration upon a single solution strategy.
0 replies, 68 likes


Nov 27 2019 Ben Hamner

Thanks @Smerity for writing the most entertaining machine learning paper I’ve ever read https://arxiv.org/abs/1911.11423 I wonder if a NLP model would classify it as ML research or comedy
0 replies, 55 likes


Nov 27 2019 Sanyam Bhutani

"This work has undergone no intensive hyperparam optimization [..] The final results are achievable in +/- 24 hrs on a single GPU [..] The attention mech is also readily extended to large contexts and req minimal comp. Take that Sesame Street" - @Smerity https://arxiv.org/abs/1911.11423
1 replies, 44 likes


Nov 27 2019 Federico Vaggi

On top of nearly matching SOTA on a laptop, this paper is a true delight to read.
2 replies, 30 likes


Dec 06 2019 Jasmine 🌱

“The author's lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result.” I have never laughed multiple times reading a paper thank you @Smerity
0 replies, 26 likes


Nov 27 2019 Julien Chaumond

If anything this sets the bar for how well-written (in the sense of enjoyable to read) a paper can be. Thanks @Smerity!
0 replies, 21 likes


Nov 29 2019 Arianna Bisazza

Truly loved this one: "Directly comparing the head count for LSTM models and Transformer models obviously doesn’t make sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead"
1 replies, 18 likes


Nov 27 2019 HN Front Page

Single Headed Attention RNN L: https://arxiv.org/abs/1911.11423 C: https://news.ycombinator.com/item?id=21647804
0 replies, 18 likes


Nov 29 2019 Sam Finlayson

Wow, finally reading this paper, and the writing lives up to the hype! https://www.arxiv-vanity.com/papers/1911.11423/
0 replies, 17 likes


Nov 27 2019 Joe Barrow

A fun read (and not just for an academic paper): "Directly comparing the head count for LSTM models and Transformer models obviously doesn’t make sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead."
0 replies, 17 likes


Nov 27 2019 Carlos E. Perez 🧢

This paper https://arxiv.org/abs/1911.11423 is a must read for those doing NLP work. It's refreshingly honest and written in an entertaining and readable style. I think very few can pull off this style! This is likely the best Deep Learning paper of 2019! @Smerity #ai #nlp #deeplearning
1 replies, 11 likes


Nov 27 2019 Hamel Husain

A must read for those who worship Sesame Street (BERT). Love the practical nature of this research that uses very modest compute available to the average practitioner. Also the paper is highly engaging and dare I say fun
0 replies, 11 likes


Nov 27 2019 Xavier Amatriain

And... A kick-ass abstract 😂
0 replies, 10 likes


Nov 27 2019 Federico Andres Lois

This paper is short of insane... I don't recall having so much fun reading the research. To be true, I had a time where I had to step back and ponder if it wasn't an April fool's thingy.
0 replies, 9 likes


Nov 27 2019 Jade Abbott

"To clarify, I’m entirely happy if this model fails, but why dismiss possibility out of hand? Why crowd a single direction of progress like moths swarming a light bulb?" ~ @Smerity This paper is part science + part comedic commentary on the field. It's absolutely glorious
0 replies, 9 likes


Nov 27 2019 Sanyam Bhutani

This is honestly THE BEST RESEARCH PAPER READ. I really really enjoyed reading the paper!
0 replies, 9 likes


Nov 28 2019 Brandon Rohrer

Come for the jokes, stay for the seismic insights. “What if we weren’t all working on the same problems using the same tools?” -@smerity, paraphrased
1 replies, 8 likes


Nov 27 2019 abhishek thakur

YES! https://arxiv.org/pdf/1911.11423.pdf https://t.co/OAidjsArWM
1 replies, 7 likes


Nov 28 2019 arXiv CS-CL

Single Headed Attention RNN: Stop Thinking With Your Head http://arxiv.org/abs/1911.11423
0 replies, 4 likes


Nov 27 2019 Alain Rakotomamonjy

best ML paper abstract ever. + neat results inside.
0 replies, 4 likes


Nov 27 2019 Edward Dixon

@seb_ruder Not only a nice paper, but seems a very engineering-friendly approach to #DeepLearning for #NLProc , very abstemious in terms of compute.
1 replies, 4 likes


Nov 30 2019 Michael Ekstrand

In which we have a paper equally useful in classes on NLP and STS. Pondering how useful it might be for introducing ML students to the concept of the social construction of knowledge (in this case, that LSTMs are a dead end and real progress requires $CORP resources).
1 replies, 3 likes


Nov 27 2019 Mihail Eric

Attention may not be all you need! @Smerity's paper reads like a Michael Bay thriller. I eagerly await the sequel in the Transformers epic: Revenge of the Fallen.
0 replies, 3 likes


Nov 30 2019 Sharada Mohanty

Amazing work and a delightful read ! Good results dont need massive cloud compute, and good work doesnt always need to be communicated with a serious face ! Hope to see more and more success stories like this 🔥
0 replies, 3 likes


Nov 28 2019 James Vincent

this paper from @Smerity on his new language model is a wonderful read... Technical innovations aside (and I see it's getting plenty of praise for those!) it's a fantastic intro to the Big Picture of language modeling https://arxiv.org/pdf/1911.11423.pdf https://t.co/elT3QD9q49
0 replies, 3 likes


Nov 27 2019 mattiadg

Maybe LSTMs (and normally-sized network with them) are not dead after all.
0 replies, 3 likes


Nov 27 2019 Samiur Rahman

Was reading this paper and laughing out loud at the abstract (first time for an NLP paper!). I was like "I'd love to meet this person". Then I took a look at the author: turns out it's @Smerity. No surprise! P.S. This is a great paper even without humor! https://arxiv.org/pdf/1911.11423.pdf https://t.co/PzxdXGWtK7
0 replies, 2 likes


Nov 27 2019 Tom Liptrot

God damn it, I just caught up with transformers and this comes along... Great paper... https://arxiv.org/abs/1911.11423
1 replies, 2 likes


Dec 04 2019 Marco Banterle

One of the most interesting (amusing AND insightful) reads of the last few months: https://arxiv.org/abs/1911.11423 by @Smerity Are LSTMs back in the Xmas wish-lists?
0 replies, 2 likes


Nov 27 2019 Pranav Hari💻👷

This paper's abstract is too good "This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer"
0 replies, 1 likes


Nov 27 2019 Karanbir Chahal

Loved this paper ! I see that @Smerity is trying out the @pjreddie method of writing. I for one am all for it. Paints a picture of the big companies being the Empire, the compute their death star and our brave rebels fighting back with their small GPUs ( read X wings)
0 replies, 1 likes


Nov 27 2019 Johannes Baiter

This paper has the most amazing abstract I have EVER read.
1 replies, 1 likes


Content