Papers of the day   All papers

Single Headed Attention RNN: Stop Thinking With Your Head

Comments

Smerity: Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed. https://arxiv.org/abs/1911.11423 https://t.co/RN5TPZ3xWH

60 replies, 1965 likes


hardmaru: “The author's goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTM and get near SOTA” 🔥

1 replies, 313 likes


Thomas Lahore: Single Headed Attention RNN: Stop Thinking With Your Head "The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient." "Take that Sesame Street." paper: https://arxiv.org/abs/1911.11423 code: https://github.com/smerity/sha-rnn https://t.co/dT6v2DounV

3 replies, 238 likes


Jason Antic: I saw this paper and immediately dropped what I was doing to read it and got hooked immediately. It’s super interesting, insightful, and FUNNY. I also love his discipline of sticking with a humble desktop to run his experiments- I strongly believe in this.

1 replies, 135 likes


Sara Hooker: Now here is an abstract with flair: "This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer... Take that Sesame Street."

0 replies, 111 likes


Colin Raffel: Entertaining exposition aside, I think the best quote from this paper is "there are usually far more efficient ways to achieve something once we know it’s possible."

0 replies, 103 likes


Miles Brundage: "Single Headed Attention RNN: Stop Thinking With Your Head," @Smerity: https://arxiv.org/abs/1911.11423

3 replies, 82 likes


Mark O. Riedl: This paper speculates on what might have happened if we didn’t discover Transformer architectures and hammered on RNNs instead. We might be in more or less the same place. Also @Smerity is very funny and you should read it.

0 replies, 80 likes


Aidan Gomez 😷: A fantastic paper highlighting the neglect RNN-based LMs have seen over the past couple of years. I think it's incredibly important to maintain diversity in the currents of research, and I find it unproductive for a field to collapse to iteration upon a single solution strategy.

0 replies, 68 likes


Ben Hamner: Thanks @Smerity for writing the most entertaining machine learning paper I’ve ever read https://arxiv.org/abs/1911.11423 I wonder if a NLP model would classify it as ML research or comedy

0 replies, 55 likes


Sanyam Bhutani: "This work has undergone no intensive hyperparam optimization [..] The final results are achievable in +/- 24 hrs on a single GPU [..] The attention mech is also readily extended to large contexts and req minimal comp. Take that Sesame Street" - @Smerity https://arxiv.org/abs/1911.11423

1 replies, 44 likes


Ali Safaya: After reimplementing SHA-RNN using Julia and Knet, here is my visualization of the model. Note to myself: LN is Layer Normalization http://github.com/alisafaya/SHA-RNN.jl https://t.co/8HAOq0ynOX

0 replies, 33 likes


Federico Vaggi: On top of nearly matching SOTA on a laptop, this paper is a true delight to read.

2 replies, 30 likes


Jasmine 🌱: “The author's lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result.” I have never laughed multiple times reading a paper thank you @Smerity

0 replies, 26 likes


Julien Chaumond: If anything this sets the bar for how well-written (in the sense of enjoyable to read) a paper can be. Thanks @Smerity!

0 replies, 21 likes


HN Front Page: Single Headed Attention RNN L: https://arxiv.org/abs/1911.11423 C: https://news.ycombinator.com/item?id=21647804

0 replies, 18 likes


Arianna Bisazza: Truly loved this one: "Directly comparing the head count for LSTM models and Transformer models obviously doesn’t make sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead"

1 replies, 18 likes


Sam Finlayson: Wow, finally reading this paper, and the writing lives up to the hype! https://www.arxiv-vanity.com/papers/1911.11423/

0 replies, 17 likes


Joe Barrow: A fun read (and not just for an academic paper): "Directly comparing the head count for LSTM models and Transformer models obviously doesn’t make sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead."

0 replies, 17 likes


Leonid Boytsov: "Language models, at least as they stand, are not intelligent ... humans that created ... datasets are doing the intellectual work. A language model, passing over enough text, is merely surfacing and connecting fragments of cached human computation." https://arxiv.org/pdf/1911.11423.pdf

0 replies, 16 likes


Samiur Rahman: Was reading this paper and laughing out loud at the abstract (first time for an NLP paper!). I was like "I'd love to meet this person". Then I took a look at the author: turns out it's @Smerity. No surprise! P.S. This is a great paper even without humor! https://arxiv.org/pdf/1911.11423.pdf https://t.co/PzxdXGWtK7

0 replies, 12 likes


Carlos E. Perez 🧢: This paper https://arxiv.org/abs/1911.11423 is a must read for those doing NLP work. It's refreshingly honest and written in an entertaining and readable style. I think very few can pull off this style! This is likely the best Deep Learning paper of 2019! @Smerity #ai #nlp #deeplearning

1 replies, 11 likes


Hamel Husain: A must read for those who worship Sesame Street (BERT). Love the practical nature of this research that uses very modest compute available to the average practitioner. Also the paper is highly engaging and dare I say fun

0 replies, 11 likes


Xavier Amatriain: And... A kick-ass abstract 😂

0 replies, 10 likes


Jade Abbott: "To clarify, I’m entirely happy if this model fails, but why dismiss possibility out of hand? Why crowd a single direction of progress like moths swarming a light bulb?" ~ @Smerity This paper is part science + part comedic commentary on the field. It's absolutely glorious

0 replies, 9 likes


Sanyam Bhutani: This is honestly THE BEST RESEARCH PAPER READ. I really really enjoyed reading the paper!

0 replies, 9 likes


Federico Andres Lois: This paper is short of insane... I don't recall having so much fun reading the research. To be true, I had a time where I had to step back and ponder if it wasn't an April fool's thingy.

0 replies, 9 likes


Brandon Rohrer: Come for the jokes, stay for the seismic insights. “What if we weren’t all working on the same problems using the same tools?” -@smerity, paraphrased

1 replies, 8 likes


abhishek thakur: YES! https://arxiv.org/pdf/1911.11423.pdf https://t.co/OAidjsArWM

1 replies, 7 likes


Alex Polozov: @justinesherry That's a high bar. I only ascribe it to papers that were specifically written to be enjoyable: 1) https://pjreddie.com/media/files/papers/YOLOv3.pdf 2) https://arxiv.org/abs/1911.11423

0 replies, 5 likes


arXiv CS-CL: Single Headed Attention RNN: Stop Thinking With Your Head http://arxiv.org/abs/1911.11423

0 replies, 4 likes


Alain Rakotomamonjy: best ML paper abstract ever. + neat results inside.

0 replies, 4 likes


Edward Dixon: @seb_ruder Not only a nice paper, but seems a very engineering-friendly approach to #DeepLearning for #NLProc , very abstemious in terms of compute.

1 replies, 4 likes


James Vincent: this paper from @Smerity on his new language model is a wonderful read... Technical innovations aside (and I see it's getting plenty of praise for those!) it's a fantastic intro to the Big Picture of language modeling https://arxiv.org/pdf/1911.11423.pdf https://t.co/elT3QD9q49

0 replies, 3 likes


ymtk _(:3 」∠)_: 草 > The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. https://arxiv.org/abs/1911.11423

0 replies, 3 likes


Mihail Eric: Attention may not be all you need! @Smerity's paper reads like a Michael Bay thriller. I eagerly await the sequel in the Transformers epic: Revenge of the Fallen.

0 replies, 3 likes


Michael Ekstrand: In which we have a paper equally useful in classes on NLP and STS. Pondering how useful it might be for introducing ML students to the concept of the social construction of knowledge (in this case, that LSTMs are a dead end and real progress requires $CORP resources).

1 replies, 3 likes


mattiadg: Maybe LSTMs (and normally-sized network with them) are not dead after all.

0 replies, 3 likes


Sharada Mohanty: Amazing work and a delightful read ! Good results dont need massive cloud compute, and good work doesnt always need to be communicated with a serious face ! Hope to see more and more success stories like this 🔥

0 replies, 3 likes


Tom Liptrot: God damn it, I just caught up with transformers and this comes along... Great paper... https://arxiv.org/abs/1911.11423

1 replies, 2 likes


abhishek thakur: @A_K_Nain @victor_basu_360 @Smerity ’s SHA-RNN is a gem 😀 https://arxiv.org/pdf/1911.11423.pdf

0 replies, 2 likes


Marco Banterle: One of the most interesting (amusing AND insightful) reads of the last few months: https://arxiv.org/abs/1911.11423 by @Smerity Are LSTMs back in the Xmas wish-lists?

0 replies, 2 likes


Akash Singh: Good Research is not using thousands of GPU's or TPU's and training them for like months just to produce SOTA scores. @Smerity paper "Single Headed Attention RNN: Stop Thinking With Your Head" https://arxiv.org/abs/1911.11423 is a good example of good science #DeepLearning #NLP

0 replies, 2 likes


Karanbir Chahal: Loved this paper ! I see that @Smerity is trying out the @pjreddie method of writing. I for one am all for it. Paints a picture of the big companies being the Empire, the compute their death star and our brave rebels fighting back with their small GPUs ( read X wings)

0 replies, 1 likes


𝘾𝙖𝙧𝙡 𝙍𝙞𝙤𝙪𝙭: [R] "Single Headed Attention RNN: Stop Thinking With Your Head": Take that Sesame Street!: One of THE Best Papers that I've ever read (Both in terms of the research and the paper-writeup itself): https://arxiv.org/pdf/1911.11423.pdf The leading approaches in language… http://dlvr.it/RKBvmY

0 replies, 1 likes


Johannes Baiter: This paper has the most amazing abstract I have EVER read.

1 replies, 1 likes


Noah Caldwell-Gatsos: @lemire By itself, it’s a unique approach, but it’s part of a larger pattern this year of finding cost efficient alternatives to expensive GPUs. @Smerity published a similar paper around last Thanksgiving with similar goals, you can read here: https://arxiv.org/abs/1911.11423

1 replies, 1 likes


Pranav Hari💻👷: This paper's abstract is too good "This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer"

0 replies, 1 likes


Nikete: Human beings, at least as they stand, are not intelligent... evolution that created the DNA that structures the humans brains is doing the real intellectual work. A human being, passing through life, is merely surfacing and connecting fragments of cached evolutionary knowledge.

1 replies, 1 likes


Benjamin Singleton: Single Headed Attention RNN: Stop Thinking With Your Head #BigData #DataScience https://arxiv.org/abs/1911.11423

0 replies, 1 likes


Daniel Karavolos: I love this abstract. "Take that Sesame Street!" Rest of the paper in similar style :) https://arxiv.org/abs/1911.11423 #DeepLearning

0 replies, 1 likes


erogol: A transformer alternative that you can train in your garage https://arxiv.org/abs/1911.11423

0 replies, 1 likes


Content

Found on Nov 27 2019 at https://arxiv.org/pdf/1911.11423.pdf

PDF content of a computer science paper: Single Headed Attention RNN: Stop Thinking With Your Head