Papers of the day   All papers

Probing Neural Network Comprehension of Natural Language Arguments

Comments

Jul 22 2019 hardmaru

Contrary to popular belief, training a gigantic model on a humongous dataset of human text will not lead to AGI. 🙈🧠 Probing Neural Network Comprehension of Natural Language Arguments: https://arxiv.org/abs/1907.07355
18 replies, 559 likes


Jul 18 2019 Benjamin Heinzerling

BERT is very good at being right for the wrong reasons. Great analysis of BERT's ability to learn exploiting annotation artifacts better than other models: https://arxiv.org/abs/1907.07355 Performance drops from 77% to random chance level when these artifacts are removed.
0 replies, 424 likes


Jul 21 2019 Melanie Mitchell

I'm fascinated by transformer architectures in NLP & curious abt what they actually learn. I just read https://arxiv.org/pdf/1907.07355.pdf, which shows, for one dataset on which a transformer is close to "human performance", spurious statistics completely account for the network's success.
7 replies, 181 likes


Jul 23 2019 John Regehr

hell of an abstract https://t.co/ixiQXrLQEm
2 replies, 167 likes


Jul 21 2019 /MachineLearning

BERT's success in some benchmarks tests may be simply due to the exploitation of spurious statistical cues in the dataset. Without them it is no better then random. https://www.reddit.com/r/MachineLearning/comments/cfxpxy/berts_success_in_some_benchmarks_tests_may_be/
1 replies, 135 likes


Jul 23 2019 Emily M. Bender

Niven & Kao's upcoming #acl2019nlp paper "Probing Neural Network Comprehension of Natural Language Arguments" asks exactly the right question of unreasonable performance: "what has BERT learned about argument comprehension?" Preprint: https://arxiv.org/abs/1907.07355 /1
1 replies, 73 likes


Aug 16 2019 Leon Derczynski

Clever Hans: the horse we thought could do arithmetic, but really was relying on other signals. Really enjoyed this blog post on NLP, clever hands and what to do instead of leaderboards. https://bheinzerling.github.io/post/clever-hans/ #nlproc
1 replies, 60 likes


Jul 22 2019 Emmanuel Ameisen

If you look at the dataset to understand how your model performs, you'll often see that your model is actually struggling. Here, BERT's accuracy on the test set drops from 77% to 50% (random) after researchers identify and correct data leakage. https://arxiv.org/abs/1907.07355
1 replies, 47 likes


Jul 19 2019 Leon Derczynski

Huge if true - this work indicates that BERT exploits artifacts that distort / inflate behaviors around it, and when cleaned up, the results are markedly less impressive
7 replies, 42 likes


Jul 23 2019 Jens Lehmann

https://arxiv.org/abs/1907.07355 is an interesting paper, which shows that the way neural networks solve tasks differs substantially from how humans do it. They often defeat benchmarks in ways they were not meant to be defeated, which can lead to an overestimation of their abilities.
0 replies, 30 likes


Jul 22 2019 Skynet Today 🤖

wow, AGI is maybe not around the corner, can you believe it.
0 replies, 25 likes


Jul 23 2019 always @ ( * )

More derp learning - https://arxiv.org/pdf/1907.07355.pdf
1 replies, 22 likes


Jul 23 2019 fastml extra

"BERT's peak performance of 77% on the Argument Reasoning Comprehension Task is entirely accounted for by exploitation of spurious statistical cues in the dataset. " On a fixed dataset, " all models achieve random accuracy". https://arxiv.org/abs/1907.07355
1 replies, 18 likes


Jul 21 2019 Dean P

Probing Neural Network Comprehension of Natural Language Arguments by Timothy Niven and Hung-Yu Kao claims that #BERT’s performance on the argument reasoning tasks is due to a problem in the dataset. @GoogleAI https://arxiv.org/abs/1907.07355
1 replies, 14 likes


Jul 22 2019 Daniel Situnayake

I mean maybe that's how half of us reason, too 🤔 https://t.co/2VkTXeed3s
0 replies, 5 likes


Sep 16 2019 Michał Chromiak

This paper claims BERT’s impressive performance might be attributed to “exploitation of spurious statistical cues in the dataset” and that without them, BERT may be no better than random models. #DeepLearning #MachineLearning #ML #DL https://arxiv.org/abs/1907.07355
0 replies, 4 likes


Jul 19 2019 Max Little

Simple dataset confounding strikes state of the art deep learning once again; the NLP edition.
1 replies, 4 likes


Jul 20 2019 Hugues de Mazancourt

@mlpowered A good illustration in this paper: researchers found why BERT reaches just three points below the average untrained human baseline. It's simply due to spurious statistical clues in the data set. Always look at the data. https://arxiv.org/abs/1907.07355
0 replies, 3 likes


Sep 06 2019 cathal horan

Great paper to show the importance of #datasets in #DeepLearning #NLP. It shows that BERT performance in certain tasks is due to "exploiting" #statistical cues, eg negation. Remove negation from data and results are close to random. https://arxiv.org/abs/1907.07355 #MachineLearning
0 replies, 3 likes


Nov 07 2019 Jonathan Peck

@mtutek @GaryMarcus @tdietterich It is debatable whether GPT-2 represents actual progress. On some benchmarks at least, its high performance is an illusion. https://arxiv.org/abs/1907.07355
0 replies, 2 likes


Jul 19 2019 Ramakanth Kavuluru

We didn’t have much luck with BERT in a recent project. It is really troublesome if performance depends on weird annotation artifacts. Need to watch out for things like this in future.
0 replies, 2 likes


Jul 21 2019 Kristen Allen

@alexdaviscmu "We show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them." https://arxiv.org/abs/1907.07355
1 replies, 2 likes


Jul 21 2019 Rohit Pgarg

Probing Neural Network Comprehension of Natural Language Arguments https://arxiv.org/abs/1907.07355 Found via @benbenhh PAPER THREAD 1/
1 replies, 1 likes


Content