Papers of the day   All papers

Measuring Massive Multitask Language Understanding

Comments

Dan Hendrycks: How multipurpose is #GPT3? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement. https://arxiv.org/pdf/2009.03300 https://github.com/hendrycks/test https://t.co/jCqFvdPeSv

16 replies, 498 likes


AK: Measuring Massive Multitask Language Understanding pdf: https://arxiv.org/pdf/2009.03300.pdf abs: https://arxiv.org/abs/2009.03300 github: https://github.com/hendrycks/test https://t.co/qNtu54SFQs

1 replies, 48 likes


Natalie Wolchover: The hardest things are math, physics, moral quandaries, and... chemistry. Nothing is harder than chemistry.

5 replies, 32 likes


Naomi Saphra: Confirmed: moral disputes harder than compsci https://arxiv.org/abs/2009.03300 https://t.co/p2Y0ILTZ3b

2 replies, 14 likes


Thom Scott-Phillips: This preprint seems to be a more elaborated test of what @GaryMarcus was testing for in the article above. The results corroborate his main point about GPT-3, I think. https://arxiv.org/abs/2009.03300

1 replies, 11 likes


Sean Welleck: "Measuring Massive Multitask Language Understanding" Nice few-shot evaluation of GPT-3 on various tasks. https://arxiv.org/pdf/2009.03300.pdf by @DanHendrycks et al https://t.co/BucoHsoySh

0 replies, 8 likes


Toby Walsh (Hiring 4 PostDocs + 8 PhDs): For anyone thinking AGI is near:"on every 1 of the 57 tasks, the best models still need substantial improvements before they can reach human-level accuracy.. they still have near random accuracy on some socially important subjects such as morality and law" https://arxiv.org/pdf/2009.03300.pdf

2 replies, 8 likes


Cullen O’Keefe: Interesting paper including discussion on how GPT3 performs on legal questions https://arxiv.org/abs/2009.03300

0 replies, 5 likes


arXiv CS-CL: Measuring Massive Multitask Language Understanding http://arxiv.org/abs/2009.03300

0 replies, 3 likes


Ronen Tamari: "[GPT-3] has descriptive knowledge and knows about of the order of operations, it does not know how to apply its knowledge" Interesting massive probing of GPT-3 & other transformers https://arxiv.org/abs/2009.03300 Thanks @ChenShani2 for heads up

0 replies, 1 likes


Content

Found on Sep 08 2020 at https://arxiv.org/pdf/2009.03300.pdf

PDF content of a computer science paper: Measuring Massive Multitask Language Understanding