TARO will be featuring blog posts in the coming months highlighting the work of its member repositories as it relates to finding aids, EAD, arrangement and description, and other relevant topics. The following post comes to us from Natalie Idom and Georgie Gaines at the Southwest Collection/Special Collections Library at Texas Tech. If your repository is interested in sharing a post, please contact the TARO Steering Committee.
In October 2019, the Southwest Collection (SWC) unit of Texas Tech University’s Southwest Collection/Special Collections Library embarked on a monumental project. Over the course of several years, we diligently created and published digital finding aids for over two thousand collections, some of which were accrued in the many decades before the creation of Encoded Archival Description (EAD). In late 2023, we reached a significant milestone with the completion of 1,921 finding aids, all of which are now available on Texas Archival Resources Online (TARO). This comprehensive effort accounts for all “legacy” collections with a discoverable inventory, marking a significant step forward for our archive.
In the roughly four years since the inception of the legacy finding aid project–known locally as “Project Blade Runner” for the “2,049” collections in need of a finding aid–we have experimented with several methods for efficient encoding of inventories in XML. Rather than using ArchivesSpace, which was not an option given IT resources available, we opted to test various XML-conversion scripts. Previously, we would type an inventory in Word and painstakingly copy and paste each line into an XML code in Oxygen XML Author. With a script, we could go from an Excel spreadsheet to an encoded inventory in a matter of seconds, sometimes saving days’ worth of work on a single finding aid. After experiencing limited success using Microsoft Word’s mail merge function, Archival Associate Sarah Stephenson and an external volunteer developed a new script in PowerShell that significantly reduced the number of errors in the resultant code and allowed for items and folders within the collection to be linked to their digitized counterparts. In the final months of the project, an updated script programmed with Python instead of PowerShell increased efficiency further with a greater ability to identify XML errors produced by the code. Using this system, even large inventories could be encoded in XML with speed and accuracy, leaving us more time and energy to invest in sections of the finding aid that require more research to complete thoroughly.
Under the leadership of the SWC Archivist, our team successfully completed the bulk of the project, but certain collections remain undiscoverable online. Moving forward, Archival Associate Natalie Idom is diligently working to author the more complex collections’ finding aids, which often have no collection file or inventory. This at time resulted in reprocessing the collection to current day practices according to DACS. For example, earlier SWC archivists sometimes aggregated collections into a collection titled as a series or topic, such as “Annual Reports,” encompassing multiple accessions and different donors. Given the age of these legacy collections, it was also not uncommon to identify conservation and preservation needs. We understand that sometimes we uncover more work than anticipated, but we see this as an opportunity to resolve and fix these issues, ensuring a smoother process for future projects.
Our archive has thousands of collections arranged and described. As a result, this project has also allowed us to discover what collections may be out of location as we conduct a partial physical inventory of our stacks as we write a finding aid for the projects’ more complex collections. Project Blade Runner also aided our processing staff in their understanding of the topics we collect, since some of us were newcomers to the archival field and to the SWC. The department’s veterans, however, possessed much of this knowledge, so this project bridged that gap. Now everyone is on the same page regarding the scope of what we house.
Something new came out of this project as well. A second Python script was written that focused on efficiently listing related collections, a section of the finding aid that often takes the most time to create. By creating Excel spreadsheets listing collections grouped around specific topics, we could quickly generate XML code properly formatted for inclusion under the <relatedcollections> element. We included manuscript collections, photograph collections, and oral histories. As an added benefit, we now have data about the topics most prevalent within the repository. This would be the best time to group collections together as we go, so our error of missing one will be less every time we complete a finding aid. Our related collection topics include Women’s History, Latino/Hispanic, Military Veterans (separated by war/conflict), Outer Space, Railroads, Ranching, and local groupings such as the Lubbock Women’s Club and other civic groups. We started this about halfway through the legacy project. If we were to advise someone thinking about creating this project for their legacy backlog (no matter the size), we would highly encourage you to do this in sections and create groupings for related collections as you or your department encode the backlog.
The project has culminated in benefits beyond the achievement of uploading more than 1,900 finding aids to TARO and the increased visibility of those collections. Since starting this project, we have developed a better knowledge of the content of our holdings to better assist patrons and the other internal and external archival and bibliographic units. We have developed techniques to assist us when we come across a collection with little to no paperwork. Even with the help of these methods, though, the legacy finding aid project was made possible by the diligence of students and staff over the course of years, who ground out the work one finding aid at a time.