Sinong Wang: Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity".
We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers.
Check our here: https://arxiv.org/pdf/2006.04768.pdf https://t.co/8MgpWLhTOd
6 replies, 373 likes
Leo Dirac: Transformer models have dramatically changed NLP in recent years, outperforming previous techniques like LSTM in almost every way. An exception has been that they don’t scale well to large documents because they cost O(N^2) in document length. This paper offers an O(N) solution.
2 replies, 120 likes
Jayson Cunanan: @__MLT__ 's Maths sessions are spot on! SVD last weekend and now we have
Linformer: Self-Attention with Linear Complexity
They proved (self-attention is low rank), thus approximation via SVD makes sense! @suzatweet @vatai @mrityunjay_99 @cataluna84 https://t.co/UQH6LObTl1
2 replies, 47 likes
Andrey Lukyanenko: Linformer: Self-Attention with Linear Complexity
The authors have realized that self-attention can be approximated by a low-rank matrix. So they offer a new self-attention architecture, which reduces complexity from O(N^2) to O(N) in time and space https://t.co/jUUKPIbwR1
1 replies, 28 likes
hazyresearch: .@MadianKhabsa and @sinongwang
et al.'s work has been really exciting to us: linear-time transformers https://arxiv.org/abs/2006.04768, the value of pre-training: https://arxiv.org/pdf/2006.08671.pdf , and more!
We're not involved in these papers--just reading them and like 'em!
0 replies, 13 likes
Belinda Li: Self attention is low rank --> Linear-complexity transformers!
Up on ArXiv now, with @sinongwang , me, @MadianKhabsa , @Han_Fang_ , and Hao Ma
2 replies, 12 likes
Daisuke Okanohara: Transformer requires quadratic cost wrt sequence length. Based on empirical insights and Johnson–Lindenstrauss lemma, Linformer projects context, and value into low-dim. space before the attention, which achieves linear cost while preserving the accuracy. https://arxiv.org/abs/2006.04768
0 replies, 7 likes
Peter Organisciak: FB just put out a pre-print on their 'Linformer' approach, for Transformers that scale linearly. Performance is comparable to BERT and Distilbert, while notably faster to train and use. https://arxiv.org/abs/2006.04768
1 replies, 5 likes
Han Fang: Introducing Linformer from our team. We propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.
0 replies, 4 likes
Found on Jun 09 2020 at https://arxiv.org/pdf/2006.04768.pdf