Papers of the day   All papers

Measuring Massive Multitask Language Understanding


Dan Hendrycks: How multipurpose is #GPT3? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.

16 replies, 498 likes

AK: Measuring Massive Multitask Language Understanding pdf: abs: github:

1 replies, 48 likes

Natalie Wolchover: The hardest things are math, physics, moral quandaries, and... chemistry. Nothing is harder than chemistry.

5 replies, 32 likes

Naomi Saphra: Confirmed: moral disputes harder than compsci

2 replies, 14 likes

Thom Scott-Phillips: This preprint seems to be a more elaborated test of what @GaryMarcus was testing for in the article above. The results corroborate his main point about GPT-3, I think.

1 replies, 11 likes

Sean Welleck: "Measuring Massive Multitask Language Understanding" Nice few-shot evaluation of GPT-3 on various tasks. by @DanHendrycks et al

0 replies, 8 likes

Toby Walsh (Hiring 4 PostDocs + 8 PhDs): For anyone thinking AGI is near:"on every 1 of the 57 tasks, the best models still need substantial improvements before they can reach human-level accuracy.. they still have near random accuracy on some socially important subjects such as morality and law"

2 replies, 8 likes

Cullen O’Keefe: Interesting paper including discussion on how GPT3 performs on legal questions

0 replies, 5 likes

arXiv CS-CL: Measuring Massive Multitask Language Understanding

0 replies, 3 likes

Ronen Tamari: "[GPT-3] has descriptive knowledge and knows about of the order of operations, it does not know how to apply its knowledge" Interesting massive probing of GPT-3 & other transformers Thanks @ChenShani2 for heads up

0 replies, 1 likes


Found on Sep 08 2020 at

PDF content of a computer science paper: Measuring Massive Multitask Language Understanding