Papers of the day   All papers

Large Memory Layers with Product Keys


Jul 12 2019 Guillaume Lample

Our new paper: Large Memory Layers with Product Keys We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2
13 replies, 742 likes

Jul 12 2019 Yann LeCun

Replace some layers of a BERT-like architecture with "product key memory layers" and get better perplexity for half the computation. Yes, NLP requires large memory capacity.
3 replies, 493 likes

Aug 27 2019 Guillaume Lample

Just released a small and simple implementation of our Product-Key Memory (PKM) layer. A 12-layer transformer with a single PKM layer outperforms a 24-layer transformer while being almost twice faster!
2 replies, 338 likes

Jul 12 2019 Yann LeCun

Awesome new paper from FAIR: 1. A new type of large-scale memory layer that uses product keys (FAISS-like indexing with product quantization) 2. Replace some layers in a BERT-like architecture by these Product Key Memory layers. ..... 3....
1 replies, 312 likes

Jul 12 2019 Alex Sablayrolles

"Large Memory Layers with Product Keys" with @GuillaumeLample, @LudovicDenoyer, Marc'Aurelio Ranzato and @hjegou TL;DR We introduce a large key-value memory layer with millions of values for a negligible computational cost. 1/2
1 replies, 121 likes

Sep 03 2019 Guillaume Lample

In the second we show that adding a Product-Key Memory Layer in a transformer is as efficient as doubling the number of layers in terms of performance, and has no impact on running time. with @alexsablay @hjegou @LudovicDenoyer Marc'Aurelio Ranzato (2/3)
1 replies, 46 likes

Nov 07 2019 Alexis Conneau

Our new paper: Unsupervised Cross-lingual Representation Learning at Scale We release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data. Double digit gains on XLU benchmarks + strong per-language performance (~XLNet on GLUE). [1/6]
1 replies, 15 likes

Jul 17 2019 Daisuke Okanohara

Memory layer can significantly increase the network capacity, but its cost is linear to the number of the slot. They propose to divide a key into sub-keys and consider its product key, which supports efficient search even with 1 million memory slots
0 replies, 8 likes

Jul 12 2019 小猫遊りょう(たかにゃし・りょう)

Large Memory Layers with Product Keys “We show experimentally that it provides important gains on large-scale language modeling, reaching with 12 layers the performance of a 24-layer BERT-large model with half the running time.”
1 replies, 7 likes

Nov 07 2019 Kartikay Khandelwal

Really excited to share new work! XLM-R: A multilingual model in 100 languages, trained on 2TB of data! SOTA on cross-lingual benchmarks AND competitive with monolingual models on GLUE! We also explore how to effectively train these models! My first first author NLP paper! :)
1 replies, 6 likes

Jul 13 2019 Jakub Zavrel

Interesting approach to plug a k-NN based memory layer in a state-of-the-art transformer-based architecture
0 replies, 5 likes

Nov 07 2019 Myle Ott

Now available in fairseq:
0 replies, 3 likes

Jul 12 2019 Jonathan Raiman

Cool addition to the Transformer !
0 replies, 2 likes

Jul 14 2019 HotComputerScience

Most popular computer science paper of the day: "Large Memory Layers with Product Keys"
0 replies, 1 likes

Jul 13 2019 Justin Gottschlich

Interesting new structured memory by FAIR, which can be integrated into a neural network.
0 replies, 1 likes