Dan Hendrycks: How multipurpose is #GPT3? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.
16 replies, 498 likes
AK: Measuring Massive Multitask Language Understanding
github: https://github.com/hendrycks/test https://t.co/qNtu54SFQs
1 replies, 48 likes
Natalie Wolchover: The hardest things are math, physics, moral quandaries, and... chemistry. Nothing is harder than chemistry.
5 replies, 32 likes
Naomi Saphra: Confirmed: moral disputes harder than compsci https://arxiv.org/abs/2009.03300 https://t.co/p2Y0ILTZ3b
2 replies, 14 likes
Thom Scott-Phillips: This preprint seems to be a more elaborated test of what @GaryMarcus was testing for in the article above. The results corroborate his main point about GPT-3, I think.
1 replies, 11 likes
Sean Welleck: "Measuring Massive Multitask Language Understanding"
Nice few-shot evaluation of GPT-3 on various tasks.
by @DanHendrycks et al https://t.co/BucoHsoySh
0 replies, 8 likes
Toby Walsh (Hiring 4 PostDocs + 8 PhDs): For anyone thinking AGI is near:"on every 1 of the 57 tasks, the best models still need substantial improvements before they can reach human-level accuracy.. they still have near random accuracy on some socially important subjects such as morality and law" https://arxiv.org/pdf/2009.03300.pdf
2 replies, 8 likes
Cullen O’Keefe: Interesting paper including discussion on how GPT3 performs on legal questions
0 replies, 5 likes
arXiv CS-CL: Measuring Massive Multitask Language Understanding http://arxiv.org/abs/2009.03300
0 replies, 3 likes
Ronen Tamari: "[GPT-3] has descriptive knowledge and knows about
of the order of operations, it does not know how to apply
Interesting massive probing of GPT-3 & other transformers
https://arxiv.org/abs/2009.03300 Thanks @ChenShani2 for heads up
0 replies, 1 likes
Found on Sep 08 2020 at https://arxiv.org/pdf/2009.03300.pdf