Swabha Swayamdipta: As datasets have grown larger, data exploration has become increasingly challenging. Our new work on Dataset Cartography, at @emnlp2020 with @royschwartz02, @NickLourie, @yizhongwyz, @HannaHajishirzi, @nlpnoah, @YejinChoinka offers a solution 🗺️
Paper: http://arxiv.org/abs/2009.10795 1/n https://t.co/1hItp5yOx2
2 replies, 256 likes
Noah A Smith: Dataset cartography: a new way to look at your training dataset, derived from model training dynamics with respect to each instance. Forthcoming EMNLP paper by @swabhz @royschwartz02 @NickLourie @yizhongwyz @HannaHajishirzi @nlpnoah @YejinChoinka https://arxiv.org/abs/2009.10795
4 replies, 185 likes
Roy Schwartz: Training dynamics help us visualize our data, and divide it into clearly distinctive areas: some instances are “easy-to-learn” (for the model). “hard-to-learn” instances contain many annotation errors, and “ambiguous” instances are the highest quality samples for training. 1/2 https://t.co/MQEmVJ6ANL
3 replies, 118 likes
Swabha Swayamdipta: Updated camera-ready version and code now available!
0 replies, 30 likes
Oren Etzioni: Thanks, John! Credit to @swabhz and here co-authors.
0 replies, 23 likes
John Bohannon: Beautiful work from @allen_ai -- @etzioni just keeps it coming.
The quality of data is proving to be far more important than quantity. We will definitely try out this technique at @primer_ai as we industrialize text classification. https://t.co/BTreT6zmxh
0 replies, 22 likes
Chenhao Tan: Only got to catch up with conferencing a little bit tonight, I liked this Data Cartography work by
@swabhz and coauthors: https://twitter.com/swabhz/status/1309217889568854016
The issue of data quality is under-explored and deserves much more attention. This work investigates the angle of training signals.
1 replies, 9 likes
Swabha Swayamdipta: What a cool application of Data Maps!!
0 replies, 9 likes
Swabha Swayamdipta: Learn more about our Dataset Cartography📍work at #emnlp2020 in Q&A session 16, happening in 45 mins!
0 replies, 8 likes
Ravi Shekhar: @EmtiyazKhan looks very interesting. recent "Dataset Cartography" paper by @swabhz et al. is a really simple and intuitive way to show the effect of confidence and variability on the overall training https://arxiv.org/abs/2009.10795
1 replies, 5 likes
Oznur Tastan: For this we use the method presented in @swabhz et al. https://arxiv.org/abs/2009.10795; we inspect the mean and the variation of the true class probabilities across the epochs to generate a list of possibly misannotated lncRNAs. More is in the preprint. Feedback is appreciated.
0 replies, 4 likes
HotComputerScience: Most popular computer science paper of the day:
"Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics"
0 replies, 3 likes
arXiv CS-CL: Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics http://arxiv.org/abs/2009.10795
0 replies, 1 likes
Found on Sep 24 2020 at https://arxiv.org/pdf/2009.10795.pdf