Evaluating NLP Models via Contrast Sets


Zachary Lipton: Before the media blitz & retweet party get out of control, this idea exists, has been published, has a name, and a clearer justification. It is called ***Counterfactually-Augmented Data*** and here's the published paper (spotlight at #ICLR2020).

Matt Gardner: Evaluating NLP Models via Contrast Sets New work that is a collaboration between 26 people at 10 institutions (!) Trying to tag everyone at the top of the thread, here it goes:

Noah Smith: new work by @nlpmattg of @ai2_allennlp, with a cast of dozens: contrast sets

John Platt: Adding local perturbations to NLP test sets highlights fragility of some newer models.

lazary: @jxmorris12 Looks really interesting! It reminds me of the recent "minimal pair" literature, that aims to perform minimal changes to examples that *do* change the meaning, followed by an evaluation. by @dkaushik96 et al. by @nlpmattg et al.

