Papers of the day   All papers

Language Models are Few-Shot Learners

Comments

𝔊𝔴𝔢𝔯𝔫: GPT-3 is terrifying because it's a tiny model compared to what's possible, trained in the dumbest way possible on a single impoverished modality on tiny data, yet the first version already manifests crazy runtime meta-learning—and the scaling curves 𝘴𝘵𝘪𝘭𝘭 are not bending! 😮

28 replies, 968 likes


Michael Nielsen: Spent an enjoyable few hours digging into GPT-3, trying to better understand how it works, what the limits are, how it may be improved. The paper is here: https://arxiv.org/pdf/2005.14165.pdf

14 replies, 905 likes


hardmaru: GPT-3: Language Models are Few-Shot Learners, by @notTomBrown et al. “We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.” https://arxiv.org/abs/2005.14165 https://t.co/ng1Dc6aFg3

13 replies, 547 likes


Ben Mann: We just published our paper on GPT-3! https://arxiv.org/abs/2005.14165 Proud to be part of this awesome team!

6 replies, 498 likes


Tom Brown: Language models are few shot learners! We find that larger models can often (but not always) perform NLP tasks given only natural language prompt and a few examples in the context. No fine-tuning. Paper: http://arxiv.org/abs/2005.14165 Illustrated summary ⬇️ (1/12) https://t.co/eFJ3H7OY8n

15 replies, 468 likes


Natasha Jaques: GPT-3 is conjugating words that don't exist https://arxiv.org/abs/2005.14165 https://t.co/H2R7F1COwc

12 replies, 394 likes


Mitchell Gordon: Papers like these make me feel like we're all telegraph engineers in the pre-Shannon era. Back in the day, if you had trouble getting the signal through, you just bumped up the amplitude. It kind of helped. (1/2)

2 replies, 329 likes


Nando de Freitas: This brilliant ⁦@OpenAI⁩ work and the video of ⁦@karpathy⁩ I shared recently are very exciting AI frontiers. The story repeats itself: Big net, curated data, and common sense are the ingredients. Congrats ⁦@ilyasut⁩ et al. https://arxiv.org/abs/2005.14165

3 replies, 266 likes


Mark Riedl wears pants during video calls: GPT-3 has 175 billion parameters, trained on 300 billion tokens https://arxiv.org/abs/2005.14165 https://t.co/5tJgwwmABN

13 replies, 251 likes


Alfredo Canziani: «GPT-3» is out! 🤓 With 175 billion parameters and 4 bytes per parameters / gradient it takes *only* 1.4 TB on your GPU 🤔 As comparison, a cat 🐱 cortex 🧠 has only 20× more synapses. https://arxiv.org/abs/2005.14165

6 replies, 219 likes


Rogue P. Bigham: i think i'm going to wait until GPT-4 to upgrade. seems like a mid-cycle release. trillion parameters or bust.

4 replies, 186 likes


Oriol Vinyals: Scale *still* delivers! Congrats @OpenAI on showing very nice zero/few-shot language capabilities of GPT-3. #timelesstweet Paper: https://arxiv.org/abs/2005.14165 Endless Samples: https://github.com/openai/gpt-3 https://t.co/LMfeR5EL4x

1 replies, 167 likes


NLP for Development: "In collecting training data for GPT-3, we made no effort to select either for or against foreign languages" Meaning: At @OpenAI we make no effort with language representation and show our indifference by using pejoratives like "foreign languages" http://arxiv.org/abs/2005.14165

4 replies, 125 likes


Sebastian Gehrmann from far away: The ELMo paper? 15 pages. BERT? 16 pages. GPT-2? 24 pages. T5? 53 pages. GPT-3?? 72 pages! https://arxiv.org/pdf/2005.14165.pdf Showing once and for all that paper sizes keep growing. We really should be concerned about the energy implications, poor trees :(

5 replies, 125 likes


Jonathan Fly 👾: GPT-3: Language Models are Few-Shot Learners The new 175 Billion Parameter GPT-3 excels at a battery of NLP benchmarks (translation, question-answering, etc) with prompting alone -- no fine-tuning. Awaiting more samples! abs: https://arxiv.org/abs/2005.14165 pdf: https://arxiv.org/pdf/2005.14165.pdf https://t.co/QcLhLN95vo

9 replies, 121 likes


Leon Derczynski: If GPT3 took 50 petaflop-days to train https://arxiv.org/pdf/2005.14165.pdf, w. GPUs at 10^8 flops per watt https://arxiv.org/pdf/1911.11313.pdf, so those 1.2E18 flop-hours used 12 GWh to train? E.g. 12 hours of a whole nuclear reactor? At 0.73kg per kWh that's.. 8.8 megatons of CO2?! #sanitycheck #nlproc

9 replies, 113 likes


Aran Komatsuzaki: Language Models are Few-Shot Learners - GPT 3 (175B params) causal LM - matches with sota fine-tuned performance with few-shot learning on various tasks - can write indistinguishable news articles https://arxiv.org/abs/2005.14165 https://t.co/6nHFaPI8Wr

2 replies, 110 likes


Amanda Askell: I recently worked on human evaluations of GPT-3 with @girishsastry. We found that people’s ability to distinguish model generated news articles from human written news articles approaches chance as model size increases. https://arxiv.org/abs/2005.14165 https://t.co/j4D5LrAlBX

3 replies, 105 likes


Two Minute Papers 📜: OpenAI GPT-3 - Good At Almost Everything! 🤖 ▶️Full video (ours): https://youtu.be/_x9AwxfjxvE 📜Source paper: https://arxiv.org/abs/2005.14165 ❗Source tweet: https://twitter.com/pavtalk/status/1285410751092416513?lang=en #ai #deeplearning #science #twominutepapers #neuralnetworks #machinelearning #gpt2 #gpt3 #gpt-3 #openai https://t.co/HQAfoAZsQR

3 replies, 104 likes


roadrunner01: GPT-3 is here 😮

3 replies, 103 likes


Robert (Munro) Monarch: Hey @OpenAI folk. I spent many hours working with you on GPT-2 to make sure you were #benderrule compliant and talked about language representation appropriately. You seem to have forgotten everything I taught you. Also, the internet is not "a natural distribution of languages"

2 replies, 87 likes


roadrunner01: Language Models are Few-Shot Learners pdf: https://arxiv.org/pdf/2005.14165.pdf abs: https://arxiv.org/abs/2005.14165 github: https://github.com/openai/gpt-3 https://t.co/Cvf2vjBEEJ

2 replies, 81 likes


Graham Neubig: Large model/hardware trivia: Google's new TPU supercomputer (https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds-fastest-training-supercomputer) could potentially train GPT-3 (https://arxiv.org/pdf/2005.14165.pdf) in about 7.5 days. Actually a bit longer than I expected. (GPT-3 175B model requires 3.14E+23 flops, Google cluster does 480PFLOPs/s)

1 replies, 80 likes


Gavin Baker: 1) GPT-3 and the higher semiconductor intensity of AI: This graph of the compute used to train different AI models looks like it is growing exponentially, but it is already scaled *logarithmically* https://arxiv.org/pdf/2005.14165.pdf https://t.co/IYlhd356Ke

2 replies, 76 likes


Gautam Kamath: Timely, given the discussion the other day about author order (@neu_rips). @OpenAI puts out a 31 author paper on GPT-3 (https://arxiv.org/abs/2005.14165). 1. Choosing author order in a group this large is something I want no part of; 2. They include a list of what every person contributed https://t.co/gIuQlJ3ODD

8 replies, 63 likes


Two Minute Papers 📜: OpenAI GPT-3 - Good At Almost Everything! 🤖 ▶️Full video (ours): https://youtu.be/_x9AwxfjxvE 📜Source paper: https://arxiv.org/abs/2005.14165 ❗Source tweet: https://twitter.com/sharifshameem/status/1283322990625607681 #ai #deeplearning #science #twominutepapers #neuralnetworks #machinelearning #gpt2 #gpt3 #gpt-3 #openai https://t.co/bxtjz8rTOA

5 replies, 62 likes


Richard Socher: Great new paper by @OpenAI on a massive Transformer Language Model for Controllable Generation and Multitask Learning https://arxiv.org/pdf/2005.14165.pdf There are 3 equivalent super tasks of NLP: Language models, dialogue systems and question answering. LMs have the most training data->win.

0 replies, 53 likes


Kirk Borne: The amazing @OpenAI GPT-3 #AI text-generation API has been in the news a lot lately: 1)https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/ 2)https://towardsdatascience.com/gpt-3-the-first-artificial-general-intelligence-b8d9b38557a1 3)https://arr.am/2020/07/09/gpt-3-an-ai-thats-eerily-good-at-writing-almost-anything/ 4)https://insiderpaper.com/ai-text-generator-gpt-3/ Research Paper: https://arxiv.org/abs/2005.14165 #BigData #DataScience #MachineLearning #NLG #AGI https://t.co/utT4aGvYUx

1 replies, 45 likes


Tom Brown: I encourage y’all to read (or at least skim) the paper. I’m really proud to have had a part in creating this work over the last 18 months and am glad to get to share it with you. Paper: http://arxiv.org/abs/2005.14165 Samples & Data: http://github.com/openai/gpt-3 (12/12)

2 replies, 44 likes


Sam Bowman: So, GPT-3 is out. From a first glance: The news generation and LAMBADA results are *really* impressive. I'm also a little disappointed not to see any fine-tuning experiments. Labeled data is pretty cheap! How much better would we do if we used it?

6 replies, 43 likes


John Shedletsky: Amazing AI-generated article from the GPT-3 paper (https://arxiv.org/pdf/2005.14165.pdf). #IAmAShapeshifter #YouCouldHaveWornTheTux https://t.co/5pEAf6gJF9

5 replies, 41 likes


Grady Booch: Deep fakes at scale. But with text, not images or videos.

3 replies, 30 likes


shanley: In bad news for the internet, "we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans." https://arxiv.org/abs/2005.14165

2 replies, 29 likes


Sam McCandlish: Proud to be a part of this exciting project led by Dario: https://arxiv.org/abs/2005.14165 We applied our scaling laws https://arxiv.org/abs/2001.08361 to train a highly adaptable model that can do Q&A, translation, and even poetry generation – all without any fine-tuning!

0 replies, 28 likes


Jon, from Videogames: The GPT-3 paper is out. https://arxiv.org/abs/2005.14165 https://t.co/J1ghXzPSqE

1 replies, 28 likes


Sushant Kumar: 4/n Also, GPT-3 is stochastic. So, that would mean every time it's given a word, it can come up with a different tweet. The stochasticity can be varied using the temperature parameter between 0 and 1. More on that in the official paper here: https://arxiv.org/abs/2005.14165

3 replies, 23 likes


Xander Steenbrugge: 175 𝘽𝙞𝙡𝙡𝙞𝙤𝙣 parameters.. really? Look, I'm all down for using overparameterized neural nets to solve hard tasks, but this is starting to get very impractical to run.. (maybe that's the point.. 🤔) Someone please tame this beast by pruning it down to a usable size 😅

3 replies, 18 likes


Aza Raskin: OpenAI just released GPT-3, which "can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans". This is not going to just end end poorly, but begin and middle poorly. https://arxiv.org/abs/2005.14165 https://t.co/o8adTstnoG

2 replies, 17 likes


Nick Diakopoulos: "Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans." -- It's a 10x larger model than GPT-2

0 replies, 17 likes


Sriram Krishnan: Great thread on GPT-3 strengths and weaknesses ( in case you haven’t seen any GPT-3 related tweets in your timeline already 😏)

0 replies, 17 likes


Apoorv Nandan: Turns out a model trained to predict the next word on billions of sentences learns to respond to instructions. For eg. input: translate english to french, cheese toast output: fromage au toast Zero shot. No fine tuning needed. 🤯 #gpt3 https://arxiv.org/pdf/2005.14165.pdf

1 replies, 14 likes


swapp 🥭: How to detect a fake bot, ask it to define a made up word and see if it is successful in defining it without any hesitation

2 replies, 14 likes


Jack Hessel: Gargantuan effort from OpenAI --- really cool findings re: what scale can bring! https://arxiv.org/pdf/2005.14165.pdf + an unforeseen solution for LM release ethics: It can't be used for bad if no one can load it into memory (GPT-3 weights are 270GB assuming half-point floats) ;)

1 replies, 14 likes


Sam Finlayson: Has anyone run the numbers yet on the financial and carbon cost of training this big kahuna?

5 replies, 13 likes


ralph waldo cybersyn: using the world's most advanced computer systems and algorithms, top scientists have devised a way to remove borat voice from any english sentence https://twitter.com/desplesda/status/1266187984027545600 https://t.co/uVJyCL9v3V

0 replies, 13 likes


plotly: Unlike examples that involve HTML/JSX, it is unlikely that GPT-3 was pre-trained on many annotated PX code samples. For this reason, it's really interesting to see its few-shot learning capabilities in action, which a substantial finding from the paper: https://arxiv.org/pdf/2005.14165.pdf

2 replies, 13 likes


AI 212: OpenAI GPT-3 with 175 billion parameters . Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 #GPT3 #Tensorflow #NLU #Pytorch #Python #AI #NLP #OpenAI

0 replies, 12 likes


no love deep learning: #gpt3 also has 30 authors, which implies that each author was responsible to collect ~10 billion tokens and personally train 5.84 billion parameters

1 replies, 10 likes


Brian Roemmele: On October 7th, 2005 I began using protocols in #TheIntelligenceAmplifier that is now captivating the Silicon Valley and VC world. Pre-trained language representations of NLP system called generative pre-training or GPT-3. You will hear a lot about it. https://arxiv.org/pdf/2005.14165.pdf https://t.co/cRJojppTlg

2 replies, 10 likes


brain mentality: this is kinda fucked https://arxiv.org/abs/2005.14165

4 replies, 10 likes


Christian Wolf: 175 Billion parameters, academia can't compete anymore with this insane requirements of compute... Also, 50 petaflop/s-days is a strange unit. => 24*60*60*50 = 4320000 petaflop => 4320 exaflop => 4.3 zettaflop #GPT3 https://arxiv.org/abs/2005.14165 https://t.co/phKZLKYTZx

0 replies, 9 likes


Ste𝔣an 🖥️🎧⚡: GPT-3 😱 "Language Models are Few-Shot Learners" https://arxiv.org/abs/2005.14165

1 replies, 9 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 9 likes


Sushant Kumar: @mnpinto_ @OpenAI @gdb It defintely was trained on crawl data of the web and books. So, it’s quite possible that this could have come from training data. Good find. https://arxiv.org/abs/2005.14165 https://t.co/hMxYXqYXZP

0 replies, 8 likes


Daniel Hoadley ⚫️: GPT-3 is impressive. Extraordinarily impressive in fact. But hyperbolic tweets like this really irritate me. And nowhere in this thread do I see mention of the original paper (https://arxiv.org/pdf/2005.14165.pdf) or even more specifically section 6 of the paper

3 replies, 8 likes


Hacker News: “GPT-3: Language Models Are Few-Shot Learners”, Brown et al. 2020 (OpenAI) https://arxiv.org/abs/2005.14165

0 replies, 8 likes


Edward Dixon: A behemoth of a model from a behemoth of a team. @OpenAI 's GPT-3: "For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model." @CShorten30 @seb_ruder, #NLP. Wow!

0 replies, 8 likes


Jonathan Oppenheim: Great thread on GPT-3 by @michael_nielsen without the hype.

0 replies, 7 likes


Dr. Eli David: AI models are becoming larger at a staggering rate, making computational requirements a huge bottleneck for real-world deployment. Our brain has 1000x more weights than GPT-3, but a power consumption of under 25 watts at peak performance, i.e., a small fraction of single GPU.

1 replies, 7 likes


Jacob Buckman: Something I really like about this work is its implications for RL in POMDPs. This is evidence that we will get a lot of complex behaviors "for free" by just using a giant model that encodes the history.

2 replies, 7 likes


Rishabh @ Home 🎉: https://arxiv.org/pdf/2005.14165.pdf Damn GPT-3 just came out 😱😱😱

0 replies, 5 likes


Alejandro Piad-Morffis: Stay curious 🖖: - 📃 <https://en.wikipedia.org/wiki/Language_model> - 🗞️ <https://arxiv.org/abs/2005.14165> - 💻 <https://github.com/huggingface/transformers> - 🎥 <https://youtu.be/89A4jGvaaKk> - 🎥 <https://youtu.be/_x9AwxfjxvE>

0 replies, 5 likes


Prof. Anima Anandkumar: @CliffRayman @OpenAI @Microsoft @Twitter The GPT-3 paper itself admits to #AI #bias but does not recommend any mitigation strategies https://arxiv.org/abs/2005.14165

2 replies, 5 likes


Natesh Ganesh: Given these numbers, all this talk of AI & ML democratization sounds sillier with every new bigger model.

0 replies, 5 likes


Daniel Roy: Quite an extensive Broader Impact statement there. Haven't read it closely, but curious to hear what people think.

1 replies, 5 likes


Marco De Nadai: Deep models vs CO2

0 replies, 5 likes


Richard Minerich: It might be hard to understate what a big deal this GPT-3 result is, few shot learning changes everything. Being a "data company" is suddenly much less of a moat in many cases. This might be the beginning of a huge explosion in NLP. https://arxiv.org/abs/2005.14165

3 replies, 5 likes


Mark Sanderson: Section 6 of this GPT-3 paper discusses potential language model misuse, how gender, race, and religion is represented in the model, as well as the energy used to form it. Thanks to @hannahbast for the pointer to this welcome addition. https://arxiv.org/abs/2005.14165?fbclid=IwAR2m21AfD6rIvC8p3iRyDzXKEv-dFHv8DojIQNM0qqPwWo4bathc0ScGrKs

0 replies, 4 likes


Pujaa Rajan | Black Lives Matter: 🤯 Technical Takeaways Zero-shot performance improves steadily with model size. Few-shot performance increases more rapidly. Larger models are better at in-context learning. Graph from paper: https://arxiv.org/pdf/2005.14165.pdf (9/13) https://t.co/9JBOjrPSys

1 replies, 4 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 4 likes


Convaise: With #GPT3, a few-shot learner and one of the largest language models ever trained, @OpenAI sets new standards in multiple #NLP tasks. We're exited to see how such extensive models can be used efficiently in production! http://arxiv.org/abs/2005.14165

0 replies, 4 likes


Katelyn Gadd: Machine Learning is truly a nightmare (from a GPT-3 paper, https://arxiv.org/abs/2005.14165) https://t.co/DsnssL2V5j

0 replies, 4 likes


Data & Cyber Trends: The amazing @OpenAI GPT-3 #AI text-generation API has been in the news a lot lately: 1)https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/ 2)https://towardsdatascience.com/gpt-3-the-first-artificial-general-intelligence-b8d9b38557a1 3)https://arr.am/2020/07/09/gpt-3-an-ai-thats-eerily-good-at-writing-almost-anything/ 4)https://insiderpaper.com/ai-text-generator-gpt-3/ Research Paper: https://arxiv.org/abs/2005.14165 #BigData #DataScience #MachineLearning #NLG #AG

0 replies, 4 likes


DelocalizedDanny: #MachineLearning sanitycheck...time to improve AI such that we can do the same with less data? #smartAI

0 replies, 3 likes


Hani 🧢: GPT-3 is a new gigantic language model from @openai and it will blow your mind. Just a few examples written in plain English is enough for the model to learn a new task, without any special training for it first! (Model input grey, model output black) https://arxiv.org/pdf/2005.14165.pdf https://t.co/ASK8zSJadp

0 replies, 3 likes


미키베어: GPT-3: Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 "... we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model..." https://t.co/7kzGCo84LT

1 replies, 3 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 3 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 3 likes


Jaime Sevilla: https://t.co/WkgaYTlovF

0 replies, 3 likes


Bill Grosso: The GPT-3 paper is astonishingly readable. https://arxiv.org/abs/2005.14165

0 replies, 3 likes


Peter Burns: @genuine_doubt Oh, the paper answers the first: > GPT-3 175B [can generate] 100 pages of content from a trained model can cost on the order of 0.4 kW-hr https://arxiv.org/pdf/2005.14165.pdf So ~2,000 pages of output per dollar Divide by 10 for capital, profit margin, etc, and ~200 pages per dollar

0 replies, 3 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 3 likes


Merzmensch Kosmopol: @MadBMan @OpenAI They don't tell exact sources, but these are huge Data from the Internet 2016-2019 (paper: https://arxiv.org/pdf/2005.14165.pdf). At the end there are 570GB of text. Testing GPT.-3 for knowledge we can see, there is almost everything. It even can write letters in Russian of XVIIIth cnt. https://t.co/cUoFZ9mNAK

0 replies, 3 likes


Alfredo Canziani: Full summary from first author @nottombrown follows. https://twitter.com/nottombrown/status/1266188687219384320

1 replies, 2 likes


Moiz Saifee: #DeepLearning models keep on getting bigger and better but 175B parameters is crazy even from Deep Learning's standards #NLP #DataScience

0 replies, 2 likes


Rodrigo Agerri: Every language is foreign to English, and the Internet as a messy of natural distribution of languages. Wow

1 replies, 2 likes


QC: in these troubled times please enjoy some screenshots of GPT-3 poetry; it was asked to write a poem called Shadows on the Way in the style of Wallace Stevens https://t.co/xcJmupUGkO

1 replies, 2 likes


Adi Fuchs: Bitcoin 2018: Our computation costs more than Austria’s electricity bill! NLP 2020: hold my beer. #gpt3 https://arxiv.org/pdf/2005.14165.pdf

0 replies, 2 likes


Daniel Hoadley ⚫️: @ines_curt @alexgsmith @mengwong @StewieKee @DohertyLawTeach @lawheroez @jbrowder1 @scarlettyard @sally_iaccm @tcummins @Akoneira If you’re interested in this, I’d really recommend taking a look at the GPT-3 paper. https://arxiv.org/pdf/2005.14165.pdf

0 replies, 2 likes


StructuredStories: Open AI just published a 72-page paper on GPT-3 - a 175 billion parameter language model. "for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles" https://arxiv.org/abs/2005.14165

1 replies, 2 likes


Derek Chen: Want to go even bigger than the 175 billion parameters of GPT-3 https://arxiv.org/abs/2005.14165? Then you might be interested in the 600+ bil of GShard for NMT: https://arxiv.org/abs/2006.16668 Now it's a race to one trillion!

0 replies, 1 likes


Huaiyu Khaw: The GPT-3 paper just landed on ArXiv: https://arxiv.org/abs/2005.14165 🤯

0 replies, 1 likes


Convaise: With #GPT3, a few-shot learner and one of the largest language models ever trained, @OpenAI sets new standards in multiple #NLP tasks, while falling short on others. We're exited to see how such large models can be used efficiently in practice! https://arxiv.org/abs/2005.14165

0 replies, 1 likes


J. Harry Caufield: Finally going to try reading that GPT-3 paper http://arxiv.org/abs/2005.14165

1 replies, 1 likes


Sushant Kumar: @Travpreneur The large chunk of the training data was web corpus. https://twitter.com/sushant_kumar/status/1283806510586331136?s=21

0 replies, 1 likes


Balazs Tarjan: One the most exciting results (and maybe the most terrifying) from the new GPT-3 paper (https://arxiv.org/abs/2005.14165) is that people's ability to identify whether news articles are model-generated decreases to the level of random guessing for the largest model (175B parameters!) https://t.co/EGOpeknaR8

0 replies, 1 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 1 likes


Sam Charrington: Language models getting better at writing academic papers

0 replies, 1 likes


Jorge Bravo: Truly impressed by this recent AI breakthrough: a 175-billion parameters NLP model developed by @OpenAI. Huge potential also in the scientific domain! https://arxiv.org/pdf/2005.14165.pdf https://t.co/7x8HywcydG

0 replies, 1 likes


Atis Elsts: GPT-2 had a good run. Now GPT-3 is released. I look forward to being entertained, amazed, and baffled by even higher quality auto-generated writing! https://arxiv.org/abs/2005.14165

0 replies, 1 likes


David Doswell: @wesyang A natural language processing (NLP) neural network for generating text. It is not “intelligent,” but it can simulate intelligent responses—which is often indistinguishable in practice. Technical paper on the motivations and ideas https://arxiv.org/pdf/2005.14165.pdf

1 replies, 1 likes


rohan paul: GPT-3's model is made up of 175 billion parameters For comparison, GPT-2 was 1.5 billion and the pre-GPT-3 largest Transformer-based language model released by Microsoft (Turing NLG) one month earlier was 17 billion parameters https://arxiv.org/pdf/2005.14165.pdf #GPT3 #MachineLearning https://t.co/OZnOYAkv2R

0 replies, 1 likes


Thomas Miconi: 1- Few-shot learning with zero gradient update is really cool. 2- 175 billions. With a b.

1 replies, 1 likes


arXiv CS-CL: Language Models are Few-Shot Learners http://arxiv.org/abs/2005.14165

0 replies, 1 likes


Timothy O'Hear: Deep learning models' inability to learn tasks without a large quantity of very specific data is a bit of a myth. But this takes it to a new level. On the graph below: 10^0 means "1" and 10°1 means "10" 😮

0 replies, 1 likes


Dawn Anderson: @bill_slawski @YuriyYarovoy @MordyOberstein And this is probably amongst the only things to read on that: https://arxiv.org/abs/2005.14165

0 replies, 1 likes


Rafael Cosman: For people that don't know what #GPT3 is, I highly recommend checking it out! https://arxiv.org/abs/2005.14165

0 replies, 1 likes


Daisuke Okanohara: GPT-3 is the largest non-sparse language model with 175 billion parameters. Without fine-tuning, GPT-3 can solve many NLP tasks to some extent just by adding a few (or zero) examples as additional input and predict the text following the question. https://arxiv.org/abs/2005.14165

0 replies, 1 likes


Mohamed Omran: In other words: Our 175B-parameter model, which basically memorises the English-speaking Internet, does well on natural language tasks with next to no extra training. Two things I find remarkable here:

1 replies, 0 likes


Vikas V Patil: @OpenAI research on GPT-3 language model is gigantic! It's a great step up in the field of #NLP. It has 175 billion parameters 🤓 - https://arxiv.org/abs/2005.14165

1 replies, 0 likes


Fabon Dzogang: One in two people will be convinced that #gpt-3 was actually Human when reading its fake stories. This is the result of training 175 billion parameters, on on 300 billion token occurrences. #AI #MachineLearning https://arxiv.org/pdf/2005.14165.pdf https://t.co/kL3FmUQjr3

1 replies, 0 likes


Paul O: The gender bias of GPT-3 was analyzed. "He was very", "She was very" was given as a seed & GPT-3 filled in the rest. The GPT-3 Ai was trained using >trillion words, scraped from the public internet. The result is a disturbing snapshot of the internet. https://arxiv.org/pdf/2005.14165.pdf https://t.co/pC5dY1iD1h

1 replies, 0 likes


Content

Found on May 31 2020 at https://arxiv.org/pdf/2005.14165.pdf

PDF content of a computer science paper: Language Models are Few-Shot Learners