Papers of the day   All papers

SYNTHESIZER: Rethinking Self-Attention in Transformer Models

Comments

Aran Komatsuzaki: Synthesizer: Rethinking Self-Attention in Transformer Models (1) random alignment matrices surprisingly perform quite competitively, (2) learning attention weights from query-key interactions is not that important after all. https://arxiv.org/abs/2005.00743 https://t.co/TSjssN6sp1

1 replies, 248 likes


Mark Riedl wears pants on video calls: Have you gotten used to neural transformers yet? Because here come Synthesizers https://arxiv.org/abs/2005.00743

4 replies, 117 likes


roadrunner01: Synthesizer: Rethinking Self-Attention in Transformer Models pdf: https://arxiv.org/pdf/2005.00743.pdf abs: https://arxiv.org/abs/2005.00743 https://t.co/heG7cqFC4r

0 replies, 52 likes


ML Review: Synthesizer: Rethinking Self-Attention in Transformer Models By @ytay017 @dara_bahri @MetzlerDonald (1) random alignment matrices perform surprisingly well (2) learning attention weights from (query-key) interactions are not so important https://arxiv.org/abs/2005.00743 https://t.co/4LlWGaEAIB

0 replies, 39 likes


Joan SerrĂ : Looks nice! Always thought that query-key pairs were overkill.

0 replies, 5 likes


903124: On the paper "SYNTHESIZER:Rethinking Self-Attention in Transformer Models" https://arxiv.org/pdf/2005.00743.pdf Author mention using dense layer or random matrix to replace Query Key dot product and achieve similar result 1/3 https://t.co/ybQeBsUCX6

1 replies, 5 likes


arXiv CS-CL: Synthesizer: Rethinking Self-Attention in Transformer Models http://arxiv.org/abs/2005.00743

0 replies, 4 likes


akira: https://arxiv.org/abs/2005.00743 This study revisits the Self-Attention. Self-Attention uses dot-products to take interactions between tokens. In this study, they calculate the attention weight of each token independently or treat it as a training paramete, but those result is competitive https://t.co/mDAOxoyhqe

0 replies, 3 likes


Brundage Bot: Synthesizer: Rethinking Self-Attention in Transformer Models. Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng http://arxiv.org/abs/2005.00743

1 replies, 2 likes


David Jurado: @jeremyphoward https://arxiv.org/pdf/2005.00743.pdf Not sure if this is the one

0 replies, 2 likes


Content

Found on May 05 2020 at https://arxiv.org/pdf/2005.00743.pdf

PDF content of a computer science paper: SYNTHESIZER: Rethinking Self-Attention in Transformer Models