Papers of the day   All papers

Large Memory Layers with Product Keys

Comments

Jul 12 2019 Guillaume Lample

Our new paper: Large Memory Layers with Product Keys https://arxiv.org/abs/1907.05242 We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2 https://t.co/H2I9lpRXgY
13 replies, 742 likes


Jul 12 2019 Yann LeCun

Replace some layers of a BERT-like architecture with "product key memory layers" and get better perplexity for half the computation. Yes, NLP requires large memory capacity.
3 replies, 493 likes


Aug 27 2019 Guillaume Lample

Just released a small and simple implementation of our Product-Key Memory (PKM) layer. A 12-layer transformer with a single PKM layer outperforms a 24-layer transformer while being almost twice faster! https://github.com/facebookresearch/XLM/blob/master/PKM-layer.ipynb
2 replies, 338 likes


Jul 12 2019 Yann LeCun

Awesome new paper from FAIR: 1. A new type of large-scale memory layer that uses product keys (FAISS-like indexing with product quantization) 2. Replace some layers in a BERT-like architecture by these Product Key Memory layers. ..... 3.... https://arxiv.org/abs/1907.05242
1 replies, 312 likes


Jul 12 2019 Alex Sablayrolles

"Large Memory Layers with Product Keys" with @GuillaumeLample, @LudovicDenoyer, Marc'Aurelio Ranzato and @hjegou https://arxiv.org/abs/1907.05242 TL;DR We introduce a large key-value memory layer with millions of values for a negligible computational cost. 1/2 https://t.co/ucgF9qYRjo
1 replies, 121 likes


Sep 03 2019 Guillaume Lample

In the second https://arxiv.org/abs/1907.05242 we show that adding a Product-Key Memory Layer in a transformer is as efficient as doubling the number of layers in terms of performance, and has no impact on running time. with @alexsablay @hjegou @LudovicDenoyer Marc'Aurelio Ranzato (2/3)
1 replies, 46 likes


Nov 07 2019 Alexis Conneau

Our new paper: Unsupervised Cross-lingual Representation Learning at Scale http://arxiv.org/abs/1907.05242 We release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data. Double digit gains on XLU benchmarks + strong per-language performance (~XLNet on GLUE). [1/6] https://t.co/g8XE65UTW9
1 replies, 15 likes


Jul 17 2019 Daisuke Okanohara

Memory layer can significantly increase the network capacity, but its cost is linear to the number of the slot. They propose to divide a key into sub-keys and consider its product key, which supports efficient search even with 1 million memory slots https://arxiv.org/abs/1907.05242
0 replies, 8 likes


Jul 12 2019 小猫遊りょう(たかにゃし・りょう)

Large Memory Layers with Product Keys https://arxiv.org/abs/1907.05242 “We show experimentally that it provides important gains on large-scale language modeling, reaching with 12 layers the performance of a 24-layer BERT-large model with half the running time.”
1 replies, 7 likes


Nov 07 2019 Kartikay Khandelwal

Really excited to share new work! XLM-R: A multilingual model in 100 languages, trained on 2TB of data! SOTA on cross-lingual benchmarks AND competitive with monolingual models on GLUE! We also explore how to effectively train these models! My first first author NLP paper! :)
1 replies, 6 likes


Jul 13 2019 Jakub Zavrel

Interesting approach to plug a k-NN based memory layer in a state-of-the-art transformer-based architecture
0 replies, 5 likes


Nov 07 2019 Myle Ott

Now available in fairseq: https://github.com/pytorch/fairseq/tree/master/examples/xlmr
0 replies, 3 likes


Jul 12 2019 Jonathan Raiman

Cool addition to the Transformer !
0 replies, 2 likes


Jul 14 2019 HotComputerScience

Most popular computer science paper of the day: "Large Memory Layers with Product Keys" https://hotcomputerscience.com/paper/large-memory-layers-with-product-keys https://twitter.com/GuillaumeLample/status/1149646895377076224
0 replies, 1 likes


Jul 13 2019 Justin Gottschlich

Interesting new structured memory by FAIR, which can be integrated into a neural network. https://arxiv.org/abs/1907.05242
0 replies, 1 likes


Content