Papers of the day   All papers

Linformer: Self-Attention with Linear Complexity

Comments

Sinong Wang: Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: https://arxiv.org/pdf/2006.04768.pdf https://t.co/8MgpWLhTOd

6 replies, 373 likes


Leo Dirac: Transformer models have dramatically changed NLP in recent years, outperforming previous techniques like LSTM in almost every way. An exception has been that they don’t scale well to large documents because they cost O(N^2) in document length. This paper offers an O(N) solution.

2 replies, 120 likes


Jayson Cunanan: @__MLT__ 's Maths sessions are spot on! SVD last weekend and now we have Linformer: Self-Attention with Linear Complexity https://arxiv.org/pdf/2006.04768.pdf They proved (self-attention is low rank), thus approximation via SVD makes sense! @suzatweet @vatai @mrityunjay_99 @cataluna84 https://t.co/UQH6LObTl1

2 replies, 47 likes


Andrey Lukyanenko: Linformer: Self-Attention with Linear Complexity Paper: https://arxiv.org/abs/2006.04768 The authors have realized that self-attention can be approximated by a low-rank matrix. So they offer a new self-attention architecture, which reduces complexity from O(N^2) to O(N) in time and space https://t.co/jUUKPIbwR1

1 replies, 28 likes


hazyresearch: .@MadianKhabsa and @sinongwang et al.'s work has been really exciting to us: linear-time transformers https://arxiv.org/abs/2006.04768, the value of pre-training: https://arxiv.org/pdf/2006.08671.pdf , and more! We're not involved in these papers--just reading them and like 'em!

0 replies, 13 likes


Belinda Li: Self attention is low rank --> Linear-complexity transformers! Up on ArXiv now, with @sinongwang , me, @MadianKhabsa , @Han_Fang_ , and Hao Ma

2 replies, 12 likes


Daisuke Okanohara: Transformer requires quadratic cost wrt sequence length. Based on empirical insights and Johnson–Lindenstrauss lemma, Linformer projects context, and value into low-dim. space before the attention, which achieves linear cost while preserving the accuracy. https://arxiv.org/abs/2006.04768

0 replies, 7 likes


Peter Organisciak: FB just put out a pre-print on their 'Linformer' approach, for Transformers that scale linearly. Performance is comparable to BERT and Distilbert, while notably faster to train and use. https://arxiv.org/abs/2006.04768

1 replies, 5 likes


Han Fang: Introducing Linformer from our team. We propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.

0 replies, 4 likes


Content

Found on Jun 09 2020 at https://arxiv.org/pdf/2006.04768.pdf

PDF content of a computer science paper: Linformer: Self-Attention with Linear Complexity