Papers of the day   All papers

Linformer: Self-Attention with Linear Complexity


Sinong Wang: Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here:

6 replies, 373 likes

Leo Dirac: Transformer models have dramatically changed NLP in recent years, outperforming previous techniques like LSTM in almost every way. An exception has been that they don’t scale well to large documents because they cost O(N^2) in document length. This paper offers an O(N) solution.

2 replies, 120 likes

Jayson Cunanan: @__MLT__ 's Maths sessions are spot on! SVD last weekend and now we have Linformer: Self-Attention with Linear Complexity They proved (self-attention is low rank), thus approximation via SVD makes sense! @suzatweet @vatai @mrityunjay_99 @cataluna84

2 replies, 47 likes

Andrey Lukyanenko: Linformer: Self-Attention with Linear Complexity Paper: The authors have realized that self-attention can be approximated by a low-rank matrix. So they offer a new self-attention architecture, which reduces complexity from O(N^2) to O(N) in time and space

1 replies, 28 likes

hazyresearch: .@MadianKhabsa and @sinongwang et al.'s work has been really exciting to us: linear-time transformers, the value of pre-training: , and more! We're not involved in these papers--just reading them and like 'em!

0 replies, 13 likes

Belinda Li: Self attention is low rank --> Linear-complexity transformers! Up on ArXiv now, with @sinongwang , me, @MadianKhabsa , @Han_Fang_ , and Hao Ma

2 replies, 12 likes

Daisuke Okanohara: Transformer requires quadratic cost wrt sequence length. Based on empirical insights and Johnson–Lindenstrauss lemma, Linformer projects context, and value into low-dim. space before the attention, which achieves linear cost while preserving the accuracy.

0 replies, 7 likes

Peter Organisciak: FB just put out a pre-print on their 'Linformer' approach, for Transformers that scale linearly. Performance is comparable to BERT and Distilbert, while notably faster to train and use.

1 replies, 5 likes

Han Fang: Introducing Linformer from our team. We propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.

0 replies, 4 likes


Found on Jun 09 2020 at

PDF content of a computer science paper: Linformer: Self-Attention with Linear Complexity