Papers of the day   All papers

Longformer: The Long-Document Transformer

Comments

Iz Beltagy: Excited to share our work on Longformer, a scalable transformer model for long-document NLP tasks without chunking/truncation to fit the 512 limit. Work with @mattthemathman, @armancohan Code and pretrained model: http://github.com/allenai/longformer Paper: http://arxiv.org/abs/2004.05150 (1/3)

4 replies, 421 likes


Iz Beltagy: Longformer update - a new PyTorch implementation that doesn't need the custom CUDA kernel is now available. It works on all devices, supports fp16, runs faster, and uses less memory, which make it easier to use for finetuning. Code: https://github.com/allenai/longformer https://twitter.com/i_beltagy/status/1249750021811011591

3 replies, 213 likes


roadrunner01: Longformer: The Long-Document Transformer pdf: https://arxiv.org/pdf/2004.05150.pdf abs: https://arxiv.org/abs/2004.05150 github: https://github.com/allenai/longformer https://t.co/yp3cuQ8uSI

0 replies, 33 likes


Aran Komatsuzaki: Longformer: The Long-Document Transformer A long-range LM with linear complexity. Performs on par with some of the sota models. Detailed analysis on pre-training performance, which is interesting. https://arxiv.org/abs/2004.05150

0 replies, 30 likes


Arman Cohan: Longformer: our new Transformer for long docs replacing full quadratic self-attention with a linear self-attention pattern. Pretraining analysis on long doc nlp tasks, multiple sota results in char lm and qa. w/ @i_beltagy @mattthemathman https://arxiv.org/pdf/2004.05150.pdf summary below:

1 replies, 25 likes


Ste𝔣an 🖥️🎧⚡: Longformer PR for @huggingface Transformers 😍 Thanks to @i_beltagy and the AllenAI team 🤗 Can't wait to try it out 😄 📄 https://arxiv.org/abs/2004.05150 🔗https://github.com/huggingface/transformers/pull/4352

0 replies, 25 likes


arXiv CS-CL: Longformer: The Long-Document Transformer http://arxiv.org/abs/2004.05150

0 replies, 16 likes


Apache TVM: Long transformer model with sparse aware optimization, code generated via TVM

0 replies, 10 likes


Timo Schick: Great @allen_ai paper introducing the Longformer: an O(n) Transformer variant using a set of local+global self-attention patterns. Importantly, it also contains experiments on downstream task performance & LM pretraining, something I've been missing in the Reformer paper. #NLProc

0 replies, 7 likes


Colin Raffel: @iskander @kchonyc @sleepinyourhat @yoavgo @SergeyFeldman @jeremyphoward Re: parameter sharing: https://arxiv.org/abs/1807.03819 and https://arxiv.org/abs/1909.11942 Re: cheaper attention, use domain-specific sparsity if you can, e.g. https://arxiv.org/abs/2004.05150 Re: tasks, use teacher-forced max likelihood if you can; T2T has some nice toy problems https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/algorithmic.py

1 replies, 4 likes


Iz Beltagy: Longformer update - a new PyTorch implementation that doesn't need the custom CUDA kernel is not available. It works on all devices, supports fp16, runs faster, and uses less memory, which make it easier to use for task finetuning. https://twitter.com/i_beltagy/status/1249750021811011591

1 replies, 3 likes


Hady Elsahar: The Longformer is a Memory and computation efficient sparse self-attention transformers to allow long document processing. Joint work with @mattthemathman and Arman Cohan Paper: https://arxiv.org/pdf/2004.05150.pdf Code: https://github.com/allenai/longformer

1 replies, 2 likes


Content

Found on Apr 13 2020 at https://arxiv.org/pdf/2004.05150.pdf

PDF content of a computer science paper: Longformer: The Long-Document Transformer