The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America


Ben Lee: 1/ With @LC_Labs, #NDNP, and @dsweld, I'm excited to announce the Newspaper Navigator dataset: extracted visual content & headlines from 16+ million historic newspaper pages in Chronicling America!  Paper: Code:

8 replies, 188 likes

Courtney Linder: The Library of Congress is using machine learning to make images searchable by keyword. It could be a great case study for discussions around algorithmic bias, says @lee_bcg. My latest for @PopMech:

0 replies, 43 likes

Trevor Owens 💾🗄🕚: The paper on this is super epic. Seriously cool digital humanities/digital libraries work in this.

2 replies, 31 likes

Dr Mia Ridge: Using machine learning to identify images within digitised historical newspapers - great work by @Lee_BCG for @LC_Labs building on crowdsourced classification. Pre-print here:

1 replies, 19 likes

Hacker News: The Newspaper Navigator Dataset: 16M Historic Newspaper Pages

1 replies, 15 likes

Alberto Cairo: According to the thread, a UI is in the works. This looks quite impressive:

3 replies, 15 likes

Ryan Cordell: @LC_Labs I'm especially interested in @LC_Labs's #NewspaperNavigator as a model humanities machine learning project—all code & data published with OA licenses & the full pipeline described in an accompanying paper.

1 replies, 13 likes

Pablo Aragón: Impressive work that reminds me of the @PageOneX project by @numeroteca

2 replies, 11 likes

James O'Malley: AI has just automated a job I used to have 11 years ago. When I was doing my MA, I worked at in a news clippings department, which would analyse scans of local papers for mentions of companies etc. I’d spend 8 hours non-stop dragging boxes around different headlines and stories.

1 replies, 9 likes

Trevor Owens 💾🗄🕚: Had fun catching up with @lee_bcg this afternoon. If you haven’t checked out his newspaper navigator work you are missing out! Illustrates a lot of the potential for collaborations I tried to get at in my 2018 @JCDLConf keynote

1 replies, 8 likes

Ben Lee: 8/ If you’re interested in learning more about #NewspaperNavigator, #ChronAm, or @LC_Labs, here are some additional resources: NN press release: NN dataset paper: Chronicling America: @LC_Labs

1 replies, 7 likes

Martijn Kleppe: Another great example of applying computer vision (amongst others) on digitised historical newspapers 👏

0 replies, 3 likes

Jer Thorp: This is really nifty. Sounds like there are going to be lots of interesting image sets coming out of it. Nice work, @lee_bcg!

0 replies, 2 likes

Victoria Van Hyning: So cool and useful for all sorts of research. Check it out #twitterstorians

1 replies, 1 likes

Unpublishing The News: Nothing beats discovering a new news archive! Check out this thread on Newspaper Navigator developed by @lee_bcg that uses AI to give you access to headlines and visual content in #chroniclingamerica. Think of the opportunities to research ads, comics, etc. Kudos!

1 replies, 0 likes


PDF content of a computer science paper: The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America