Papers of the day   All papers

SYNTHESIZER: Rethinking Self-Attention in Transformer Models

Comments

Aran Komatsuzaki: Synthesizer: Rethinking Self-Attention in Transformer Models (1) random alignment matrices surprisingly perform quite competitively, (2) learning attention weights from query-key interactions is not that important after all. https://arxiv.org/abs/2005.00743 https://t.co/TSjssN6sp1

1 replies, 248 likes


Mark Riedl wears pants on video calls: Have you gotten used to neural transformers yet? Because here come Synthesizers https://arxiv.org/abs/2005.00743

4 replies, 114 likes


roadrunner01: Synthesizer: Rethinking Self-Attention in Transformer Models pdf: https://arxiv.org/pdf/2005.00743.pdf abs: https://arxiv.org/abs/2005.00743 https://t.co/heG7cqFC4r

0 replies, 47 likes


ML Review: Synthesizer: Rethinking Self-Attention in Transformer Models By @ytay017 @dara_bahri @MetzlerDonald (1) random alignment matrices perform surprisingly well (2) learning attention weights from (query-key) interactions are not so important https://arxiv.org/abs/2005.00743 https://t.co/4LlWGaEAIB

0 replies, 38 likes


903124: I've use @nyanp NFL transformer model and replace QxKT by fixed matrix, and the CV score worse by about 3e-3 or about 30 spots in leaderboard and still in silver zone https://www.kaggle.com/s903124/pytorch-fixed-random-synthesizer It's similar to performance drop in fixed random in paper but still a strong baseline 2/3 https://t.co/9g1pGu7fKh

2 replies, 12 likes


903124: On the paper "SYNTHESIZER:Rethinking Self-Attention in Transformer Models" https://arxiv.org/pdf/2005.00743.pdf Author mention using dense layer or random matrix to replace Query Key dot product and achieve similar result 1/3 https://t.co/ybQeBsUCX6

1 replies, 5 likes


Joan SerrĂ : Looks nice! Always thought that query-key pairs were overkill.

0 replies, 5 likes


arXiv CS-CL: Synthesizer: Rethinking Self-Attention in Transformer Models http://arxiv.org/abs/2005.00743

0 replies, 4 likes


Brundage Bot: Synthesizer: Rethinking Self-Attention in Transformer Models. Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng http://arxiv.org/abs/2005.00743

1 replies, 2 likes


Content

Found on May 05 2020 at https://arxiv.org/pdf/2005.00743.pdf

PDF content of a computer science paper: SYNTHESIZER: Rethinking Self-Attention in Transformer Models