Papers of the day   All papers

The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America

Comments

Ben Lee: 1/ With @LC_Labs, #NDNP, and @dsweld, I'm excited to announce the Newspaper Navigator dataset: extracted visual content & headlines from 16+ million historic newspaper pages in Chronicling America!  Paper: https://arxiv.org/abs/2005.01583 Code: https://github.com/LibraryOfCongress/newspaper-navigator https://t.co/CFLrlaEdrc

7 replies, 182 likes


Courtney Linder: The Library of Congress is using machine learning to make images searchable by keyword. It could be a great case study for discussions around algorithmic bias, says @lee_bcg. My latest for @PopMech: https://www.popularmechanics.com/technology/a32436235/library-of-congress-machine-learning-newspaper-images/

0 replies, 43 likes


Trevor Owens 💾🗄🕚: The paper on this is super epic. Seriously cool digital humanities/digital libraries work in this.

2 replies, 31 likes


Dr Mia Ridge: Using machine learning to identify images within digitised historical newspapers - great work by @Lee_BCG for @LC_Labs building on crowdsourced classification. Pre-print here: https://arxiv.org/abs/2005.01583

1 replies, 19 likes


Hacker News: The Newspaper Navigator Dataset: 16M Historic Newspaper Pages https://arxiv.org/abs/2005.01583

1 replies, 15 likes


Alberto Cairo: According to the thread, a UI is in the works. This looks quite impressive:

3 replies, 15 likes


Ryan Cordell: @LC_Labs I'm especially interested in @LC_Labs's #NewspaperNavigator as a model humanities machine learning project—all code & data published with OA licenses & the full pipeline described in an accompanying paper. https://news-navigator.labs.loc.gov/ https://github.com/LibraryOfCongress/newspaper-navigator https://arxiv.org/abs/2005.01583

1 replies, 13 likes


Pablo Aragón: Impressive work that reminds me of the @PageOneX project by @numeroteca

2 replies, 11 likes


James O'Malley: AI has just automated a job I used to have 11 years ago. When I was doing my MA, I worked at in a news clippings department, which would analyse scans of local papers for mentions of companies etc. I’d spend 8 hours non-stop dragging boxes around different headlines and stories.

1 replies, 9 likes


Martijn Kleppe: Another great example of applying computer vision (amongst others) on digitised historical newspapers 👏

0 replies, 3 likes


Jer Thorp: This is really nifty. Sounds like there are going to be lots of interesting image sets coming out of it. Nice work, @lee_bcg!

0 replies, 2 likes


Victoria Van Hyning: So cool and useful for all sorts of research. Check it out #twitterstorians

1 replies, 1 likes


Content

Found on May 05 2020 at https://arxiv.org/pdf/2005.01583.pdf

PDF content of a computer science paper: The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America