Reading the First Books

Report from the NEH Office of Digital Humanities


Thirty-three projects including “Reading the First Books” were represented at the Office of Digital Humanities Project Directors Meeting, held last week at the new National Endowment for the Humanities headquarters in Washington, D.C.

The annual meeting brings together recipients of ODH funding from around the country to share projects and learn more about the NEH. Attendees reflected the diversity of the department’s funding priorities, which include start-up projects, international collaborations, digital humanities institutes, and implementation grants.

The keynote speaker for this year’s event was Bethany Nowviskie, Director of the Digital Library Federation. Dr. Nowviskie’s talk, “On Capacity and Care,” proposed a turn towards the feminist concept of “care” in the digital humanities as a counterbalance to the emphasis on big data that currently predominates. With the term care, which evokes pedagogy, human interaction, and a careful attentiveness to detail, Dr. Nowviskie reminded us that this kind of work is also central to digital scholarship, even when handling a massive amount of data.

In the “Reading the First Books” project, we find that the concept of care resonates with our approach to language. Ocular, the OCR tool that we are using, treats language as data; at the same time, to analyze and transcribe language properly we must be closely attuned to the fact that language is both social and cultural, even when it’s being internalized by a machine.

The afternoon featured a lightning round of presentations representing the full scope of ODH funding. One predominant theme was diversifying American history, including projects on slavery, Africana/Black Studies, immigration, and female writers. Innovative projects on the topics of medicine and music were also featured. The “Reading the First Books” project was represented by project Principal Investigator Dr. Sergio Romero and project coordinator Hannah Alpert-Abrams. [Slides]

In addition to “Reading the First Books,” two projects focused primarily on Latin American topics. Jonathan Amith from Gettysburg College presented a project on “Comparative Ethnobiology in Mesoamerica,” which seeks to develop an online database or “portal” that will bring together scholars working on ethnobiology across contexts and regions. And Steven Wernke of Vanderbilt University presented his project “Deep Mapping the Reducción,” which will use spatial representation tools to bring together archaeological, geological, and cartographical evidence of the General Resettlement of the Indians in the colonial Andes. Both projects use digital platforms as a way of bringing together fragmented information to improve opportunities for collaborative Latin American scholarship.

A full list of this year’s grant recipients can be found on the NEH website.



Tagged with: , , ,

NEH Grant Will Transform Study of Early Books

The University of Texas at Austin is one of six recipients of a Digital Humanities Implementation Grant award from the National Endowment for the Humanities (NEH). The grant of $215,000 will fund “Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros,” a project to extend the capabilities of current open-source optical character recognition (OCR) technology for use in the transcription of sixteenth-century texts. LLILAS Benson Latin American Studies and Collections will administer the grant as part of its new Digital Scholarship program.

The tool developed under the project will be used to produce transcriptions of the digitized books in the Primeros Libros de las Américas collection, which currently includes over 330 copies of books printed in the Americas before 1601. Books in the collection include text in Spanish, Latin, and several indigenous Latin American languages, including Nahuatl, once spoken by the Aztecs and still spoken by some 1.5 million people. UT Libraries and the Benson Latin American Collection are founding members of the Primeros Libros consortium, along with Texas A&M University and the Biblioteca José María Lafragua at the Benemérita Universidad Autónoma de Puebla. The consortium currently has over 20 member libraries from throughout the Americas and Europe, including the John Carter Brown Library, Monterrey Institute of Technology and Higher Education (ITESM), and the Universidad Complutense in Madrid.

Sample Ocular+ Output

Sample output from the Ocular+ prototype

The ability of scholars and students to work with ancient texts in digital form has been limited by the challenges of transcribing early modern books: printed long ago, they contain variable typefaces, typesetting, spelling, and multilingual text that is not recognized by conventional OCR software. The goal of this project is to develop and implement groundbreaking methods in the automatic transcription of early modern printed books. This will help scholars to shine a light on a period of history that saw a transition away from oral culture, the rise of literacy, and the birth of the scientific method.

The two-year project, which begins Sept. 1, 2015, will be overseen by Sergio Romero, assistant professor at the Teresa Lozano Long Institute of Latin American Studies (LLILAS) and the Department of Spanish and Portuguese, and by Kent Norsworthy, LLILAS Benson digital scholarship coordinator. The project further develops a prototype of Ocular, a new OCR tool developed by Taylor Berg-Kirkpatrick, Greg Durrett, and Dan Klein at UC Berkeley and adapted for Primeros Libros by comparative literature PhD student Hannah Alpert-Abrams and computer scientist Dan Garrette (U. Washington). The tool will be integrated into the Early Modern OCR Project by a team at Texas A&M University, who are partners in the grant. UT Libraries will incorporate the transcriptions produced under the project into the existing Primeros Libros website.

Alpert-Abrams, who is also a LLILAS Benson Digital Scholarship graduate research assistant, stresses the importance of collaboration across disciplines and across universities, as well as the implications for broader use of the new technology: “The NEH grant is exciting because it gives us an opportunity to conduct research and build tools with scholars from multiple disciplines and universities. The ultimate goal is to produce a tool that will be useful for anyone interested in producing digital collections of historical documents, across regions and languages.”

Nahuatl scholar Kelly McDonough, assistant professor in the UT Department of Spanish and Portuguese, sees great promise in this technology for the classroom and beyond. She says that as a result of the successful extension of OCR technology, “scholars and students will be able to rapidly search multiple corpora of multilingual texts—a task that is extraordinarily, often prohibitively, time-consuming without this technology.” In her own work, which includes the study of female indigenous leaders in colonial Mexico, she will be able to search for rarely used terms and “variants of terminology utilized by indigenous scribes over a long period of time and a large geographic area. In short, we will be able to ask questions of massive amounts of data that we simply couldn’t ask before.”

Tagged with: ,