Iz Beltagy: Excited to share our work on Longformer, a scalable transformer model for long-document NLP tasks without chunking/truncation to fit the 512 limit.
Work with @mattthemathman, @armancohan
Code and pretrained model: http://github.com/allenai/longformer
4 replies, 421 likes
Iz Beltagy: Longformer update - a new PyTorch implementation that doesn't need the custom CUDA kernel is now available.
It works on all devices, supports fp16, runs faster, and uses less memory, which make it easier to use for finetuning.
3 replies, 213 likes
roadrunner01: Longformer: The Long-Document Transformer
github: https://github.com/allenai/longformer https://t.co/yp3cuQ8uSI
0 replies, 33 likes
Aran Komatsuzaki: Longformer: The Long-Document Transformer
A long-range LM with linear complexity. Performs on par with some of the sota models. Detailed analysis on pre-training performance, which is interesting.
0 replies, 30 likes
Arman Cohan: Longformer: our new Transformer for long docs replacing full quadratic self-attention with a linear self-attention pattern. Pretraining analysis on long doc nlp tasks, multiple sota results in char lm and qa. w/ @i_beltagy @mattthemathman
1 replies, 25 likes
Ste𝔣an 🖥️🎧⚡: Longformer PR for @huggingface Transformers 😍
Thanks to @i_beltagy and the AllenAI team 🤗
Can't wait to try it out 😄
0 replies, 25 likes
arXiv CS-CL: Longformer: The Long-Document Transformer http://arxiv.org/abs/2004.05150
0 replies, 16 likes
Apache TVM: Long transformer model with sparse aware optimization, code generated via TVM
0 replies, 10 likes
Timo Schick: Great @allen_ai paper introducing the Longformer: an O(n) Transformer variant using a set of local+global self-attention patterns. Importantly, it also contains experiments on downstream task performance & LM pretraining, something I've been missing in the Reformer paper. #NLProc
0 replies, 7 likes
Colin Raffel: @iskander @kchonyc @sleepinyourhat @yoavgo @SergeyFeldman @jeremyphoward Re: parameter sharing: https://arxiv.org/abs/1807.03819 and https://arxiv.org/abs/1909.11942
Re: cheaper attention, use domain-specific sparsity if you can, e.g. https://arxiv.org/abs/2004.05150
Re: tasks, use teacher-forced max likelihood if you can; T2T has some nice toy problems https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/algorithmic.py
1 replies, 4 likes
Iz Beltagy: Longformer update - a new PyTorch implementation that doesn't need the custom CUDA kernel is not available.
It works on all devices, supports fp16, runs faster, and uses less memory, which make it easier to use for task finetuning.
1 replies, 3 likes
Hady Elsahar: The Longformer is a Memory and computation efficient sparse self-attention transformers to allow long document processing. Joint work with @mattthemathman and Arman Cohan
1 replies, 2 likes
Found on Apr 13 2020 at https://arxiv.org/pdf/2004.05150.pdf