Papers of the day   All papers

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Comments

Timo Schick: 🎉 New paper 🎉 We show that language models are few-shot learners even if they have far less than 175B parameters. Our method performs similar to @OpenAI's GPT-3 on SuperGLUE after training on 32 examples with just 0.1% of its parameter count: https://arxiv.org/abs/2009.07118 #NLProc https://t.co/vsr8ELN5Id

8 replies, 557 likes


Tim Dettmers: I am just preparing a recent few-shot learning paper that beats GPT-3 for our reading group: https://arxiv.org/abs/2009.07118. I just realize the presented algorithm (iPET) is an NLP version of Noisy Student Training. It's nice to see that some algorithms work for both NLP and CV.

2 replies, 116 likes


AK: It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners pdf: https://arxiv.org/pdf/2009.07118.pdf abs: https://arxiv.org/abs/2009.07118 https://t.co/JNCnYNNciZ

4 replies, 66 likes


Pranay Pathole: @elonmusk @xiang_aw @WholeMarsBlog Also, the commercial applications require the model to be few-shot. Check this paper out, model achieves similar results with a bit more data & it's quite recent https://arxiv.org/pdf/2009.07118.pdf

0 replies, 13 likes


Alexander Kruel: Better-than-GPT3 performance with 0.1% the number of parameters https://arxiv.org/abs/2009.07118 Shrinking GPT-3-scale capabilities from billions to millions of parameters

0 replies, 11 likes


Marek Bardoński: A great paper on how performance similar to GPT-3 can be achieved with less parameters. Some of the methods used for this is gradient-based optimization and exploring the unlabelled data. Many great tips on successful natural language understanding. https://arxiv.org/pdf/2009.07118.pdf

0 replies, 6 likes


MONTREAL.AI: It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners Timo Schick, Hinrich Schütze : https://arxiv.org/abs/2009.07118 #ArtificialIntelligence #DeepLearning #MachineLearning https://t.co/C58bCuojdm

0 replies, 4 likes


Activeloop: Interesting paper that essentially shows you might not need $4M to train a #GPT3-like model @OpenAI @timo_schick

0 replies, 3 likes


akira: https://arxiv.org/abs/2009.07118 Based on PET, which reads multiple patterned masked sentences and treats the output as a label, They improved it to be able to make multiple token predictions and surpasses GPT-3. GPT3 is a LM with billions of parameters, but it uses only 1/1000th of that. https://t.co/IJLiNVrd7O

0 replies, 2 likes


@reiver ⊼ (Charles Iliya Krempeaux): 《It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners》 by Timo Schick, Hinrich Schütze https://arxiv.org/abs/2009.07118 (machine learning)

1 replies, 2 likes


@reiver ⊼ (Charles Iliya Krempeaux): “In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller [than hundreds of billions of parameters].” https://twitter.com/reiver/status/1318614730878013440

0 replies, 2 likes


arXiv CS-CL: It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners http://arxiv.org/abs/2009.07118

0 replies, 1 likes


Miles Brundage: @sleepinyourhat https://arxiv.org/abs/2009.07118

0 replies, 1 likes


Content

Found on Sep 16 2020 at https://arxiv.org/pdf/2009.07118.pdf

PDF content of a computer science paper: It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners