Web Scraping Primary Sources from the Internet Archive – LLILAS Benson Digital Scholarship

Web Scraping Primary Sources from the Internet Archive
Wednesday, November 1, 12–2pm, PCL Data Lab

Collecting a large number of digital or digitized primary sources is a tedious and time-consuming task. If you are interested in automating the process, LLILAS graduate students Maria Victoria Fernandez and Mario Castro-Villarreal will demonstrate how to scrape collections and other data from the Internet Archive using Python, a programming language. Participants will also learn about resources and tutorials on how to apply Python to other humanities and social science research needs.