Summer update on schema conversion progress

Fellow TARO participants,

Here’s an update for you on our schema conversion progress. Generally speaking, the work is going well.

A big thank you to Minnie Rangel at UT Libraries for her work on this! And many thanks to the repositories going through this process with such good cheer. This is an important step forward for TARO.

We had hoped to finish Groups A, B and C before the end of the calendar year. Groups A and B will meet that timeline.
It is looking like Group C will need to be converted in very early 2017.
Which repositories are in which groups and how does this work?

All of our “Group A” repositories (those using software that exports XML such as Archon, ArchivistsToolkit, or ArchivesSpace) have had their existing files converted to schema format. Almost all of them have corrected the very minor errors which popped up.
These repositories are refining their workflows for submitting schema compliant & TARO friendly files now.
ArchivesSpace users are up and running, using the ArchivesSpace guidelines on the TARO Today blog.
We are working on similar how-to info for Archon, ArchivistsToolkit, CuadraStar users, which will then also be published and announced.
(Note: It was discovered that CuadraStar exports dtd-XML, not schema, so they will have a slightly different process.)
All will be keeping in mind the new TARO Standards / Best Practices Guidelines.

Our “Group B” repositories of hand-encoders are starting to be converted now.
These folks using XML editors such as Oxygen and XMetal, or other tools such as Notepad ++, will be making use of the new TARO Standards / Best Practices Guidelines (which also include XML templates, very handy for hand-encoders).

The first to be converted in this group will be:

  • San Jacinto Museum of History – Oxygen users –  July 12-14
  • Texas State Library and Archives Commission – Oxygen users – July 26-27
  • Texas Tech University Southwest Collection/Special Collections Library – Oxygen users – August 2-4
  • The University of Texas at Austin. Benson Latin American Collection – Oxygen users – August 16-18

The remaining Group B repositories are still being scheduled and will be contacted soon individually regarding their proposed dates.
Group C folks will likely be in early 2017.

Stay tuned for updates on this conversion work as the summer goes along, as well as our NEH planning grant final reports coming out later this summer.

schema conversion – ready for Group B

Fellow TARO participants,

It is now time for the “Group B” TARO repositories to be scheduled for conversion to schema compliance.

If any repositories in that group are interested in being scheduled for this work sooner rather than later, please reply to Amanda Focke (afocke@rice.edu) by the end of this week, July 1.

After hearing from repositories, we will post a specific schedule for conversion, and begin working with the first repositories.

Here is the blog post with the year’s schedule and basic info on how this will work.
**Please remember Minnie Rangel at TARO will do the conversion work and each repository will have help and personal attention along the way, ending with the repository having what they need to start submitting schema compliant finding aids.**

Here is the list (from that blog post of the Group B repositories):

Group B: Roughly scheduled for Summer / early Fall

Austin History Center, Austin Public Library (NoteTab)
Austin Presbyterian Theological Seminary (Notepad++)
Daughters of the Republic of Texas Library at the Alamo (Oxygen)
Harry Ransom Humanities Research Center, University of Texas at Austin(Oxygen)
Houston Academy of Medicine-Texas Medical Center Library, John P. McGovern Historical Collections and Research Center (Oxygen)
Houston Public Library, Houston Metropolitan Research Center (limbo between AT/AS)
San Jacinto Museum of History (Oxygen)
Southern Methodist University (Oxygen)
Stark Center, University of Texas at Austin (Notepad++)
Stephen F. Austin University (limbo between Archon/AS)
Tarlton Law Library, University of Texas at Austin (Oxygen)
Texas State Library and Archives Commission (Oxygen)
Texas Tech University Southwest Collection/Special Collections Library (Oxygen)
Texas/Dallas History and Archives Division, Dallas Public Library (NoteTab)
The University of Texas at Austin. Alexander Architectural Archive (Oxygen) –CONVERTED FEB 2016 IN TARO PILOT WORK
The University of Texas at Austin. Benson Latin American Collection (Oxygen)
The University of Texas at Austin. Dolph Briscoe Center for American History (Oxygen)
Truman G. Blocker, Jr. History of Medicine Collections,
Moody Medical Library, University of Texas Medical Branch (Oxygen)
Tyrrell Historical Library (Oxygen) University Archives and Special Collections The University of Texas at Tyler (limbo between Archon/AS)
University of Texas Arlington Library, Special Collections (XMetal)
University of Texas San Antonio (Oxygen)

 

Authority Control at TARO: Common Encoding Issues

By Tim Kindseth MSIS candidate (May 2016), School of Information, The University of Texas at Austin

Last week I posted the first (summary) section of the report I wrote about the use of EAD <controlaccess> index terms by TARO’s forty-plus contributing repositories. The second section of the report, below, outlines some of the more frequent encoding inconsistencies and problems, issues that make difficult the automated aggregation of terms necessary for faceted browsing/navigation. —Tim Kindseth


 

No values

In over 400 instances, a <controlaccess> element was used with null values. In other cases, the value is populated with placeholder text resembling encoder comments, which is likely residue from an EAD template.

  • <persname></persname>
  • <persname>NAME (SPECIFY SOURCE, ADD MORE AS NEEDED)</persname>

 

Syntax (of attributes)

EAD does not require either the @encodinganalog or @source attribute to appear before or after the other. Inconsistent syntax, though, makes it extremely difficult to extract data for analysis and normalization.

  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961.</persname>
  • <persname source=”lcnaf” encodinganalog=”600″>Ferguson, Miriam Amanda, 1875-1961.</persname>

 

Periods

LCSH and LCNAF values, when properly written, end in a period. Whether or not TARO wishes to retain this convention, terms should be constructed either with or without an ending period, not both ways.

  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961.</persname>
  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961</persname>

 

Dashes & spaces

Value subdivisions are sometimes separated by two dashes with no spaces between the dashes and values, or two dashes with a space between the dashes and values; at other times the subdivisions are delineated by an em dash with (or without) spaces between the dash and values.

  • <subject>Mexican Americans––Civil rights––Texas.</subject>
  • <subject>Mexican Americans –– Civil rights –– Texas.</subject>
  • <subject>Mexican Americans—Civil rights—Texas.</subject>
  • <subject>Mexican Americans — Civil rights — Texas.</subject>

 

Element confusion

 With place names in particular, Library of Congress subject headings are often encoded incorrectly as <geogname> control access terms. Many authorized Library of Congress subject headings are built by appending a time period or subject to a city or country name, which may explain why what is technically a subject (Dallas (Tex.)––History.) so often ends up being encoded as a geographic name. EAD3 (discussed later) allows for the parsing of encoded values and may help eliminate this confusion.

  • <geogname>Houston (Tex.)––History.</geogname>
  • SHOULD BE <subject>Houston (Tex.)––History.</subject>
  • OR <geogname>Houston (Tex.)</geogname>

 

Contradictory/dissimilar values

A set of birth and death years might appear within one <persname> element while a different set (or none at all) appears in another, even though both occurrences refer to the same individual. This happens both across and within repositories.

  • <persname>Moore, Charles Willard, 1925-1993</persname>
  • <persname>Moore, Charles Willard, 1925-1992</persname>
  • <persname>Lipscomb, Mance</persname>
  • <persname>Lipscomb, Mance, 1895-1976<persname>

 

Encoding levels

EAD2002 allows <controlaccess> terms to be nested within a main <controlaccess> heading. Repositories sometimes include <controlaccess> elements within this top level, sometimes one level down, and sometimes at both levels. When extracting TARO’s 153,000 index terms, BaseX queries thus had to be performed at two levels. This could cause unnecessary problems for a script that attempts to cull all <controlaccess> instances for display during search and retrieval.

  • <controlaccess><head>Index Terms</head><corpname>Daughters of the Republic of Texas.</corpname></controlaccess>
  • <controlaccess><head>Index Terms</head><controlaccess><head>Organizations:</head <corpname>Daughters of the Republic of Texas.</corpname></controlaccess></controlaccess>