Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros

The Reading the First Books project was completed in December 2017. We are maintaining this website as a record of the process and project. Documentation, including a “People’s Paper” containing detailed reflections, are available on the Publications Page.

You can download or view transcriptions of 50 books in the Primeros Libros collection here:


Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros is a two-year, multi-university effort to develop tools for the automatic transcription of early modern printed books. It is a collaboration between students, faculty, and staff at the University of Texas at Austin and Texas A&M University.

The Reading the First Books project will:

  • Develop tools for the automatic transcription of books printed in multiple languages, using variable orthographies, during the first centuries of the printing press.
  • Make those tools accessible for institutions and individuals by incorporating them into the Early Modern OCR Project (eMOP) at Texas A&M University, an open-source OCR workflow.
  • Produce automatic transcriptions of the Primeros Libros de las Américas collection of books printed before 1601 in the Americas, written in Spanish, Latin, Nahuatl, Huastec, Mixtec, Otomi, Tarascan, and Zapotec.

This website provides information about the project along with news, updates, and reports about our progress.

Reading the First Books is funded by a National Endowment for the Humanities Digital Implementation Grant. Any views, findings, conclusions, or recommendations expressed in this web site do not necessarily represent those of the National Endowment for the Humanities.