Authority Control at TARO: Common Encoding Issues

By Tim Kindseth MSIS candidate (May 2016), School of Information, The University of Texas at Austin

Last week I posted the first (summary) section of the report I wrote about the use of EAD <controlaccess> index terms by TARO’s forty-plus contributing repositories. The second section of the report, below, outlines some of the more frequent encoding inconsistencies and problems, issues that make difficult the automated aggregation of terms necessary for faceted browsing/navigation. —Tim Kindseth


 

No values

In over 400 instances, a <controlaccess> element was used with null values. In other cases, the value is populated with placeholder text resembling encoder comments, which is likely residue from an EAD template.

  • <persname></persname>
  • <persname>NAME (SPECIFY SOURCE, ADD MORE AS NEEDED)</persname>

 

Syntax (of attributes)

EAD does not require either the @encodinganalog or @source attribute to appear before or after the other. Inconsistent syntax, though, makes it extremely difficult to extract data for analysis and normalization.

  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961.</persname>
  • <persname source=”lcnaf” encodinganalog=”600″>Ferguson, Miriam Amanda, 1875-1961.</persname>

 

Periods

LCSH and LCNAF values, when properly written, end in a period. Whether or not TARO wishes to retain this convention, terms should be constructed either with or without an ending period, not both ways.

  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961.</persname>
  • <persname encodinganalog=”600″ source=”lcnaf”>Ferguson, Miriam Amanda, 1875-1961</persname>

 

Dashes & spaces

Value subdivisions are sometimes separated by two dashes with no spaces between the dashes and values, or two dashes with a space between the dashes and values; at other times the subdivisions are delineated by an em dash with (or without) spaces between the dash and values.

  • <subject>Mexican Americans––Civil rights––Texas.</subject>
  • <subject>Mexican Americans –– Civil rights –– Texas.</subject>
  • <subject>Mexican Americans—Civil rights—Texas.</subject>
  • <subject>Mexican Americans — Civil rights — Texas.</subject>

 

Element confusion

 With place names in particular, Library of Congress subject headings are often encoded incorrectly as <geogname> control access terms. Many authorized Library of Congress subject headings are built by appending a time period or subject to a city or country name, which may explain why what is technically a subject (Dallas (Tex.)––History.) so often ends up being encoded as a geographic name. EAD3 (discussed later) allows for the parsing of encoded values and may help eliminate this confusion.

  • <geogname>Houston (Tex.)––History.</geogname>
  • SHOULD BE <subject>Houston (Tex.)––History.</subject>
  • OR <geogname>Houston (Tex.)</geogname>

 

Contradictory/dissimilar values

A set of birth and death years might appear within one <persname> element while a different set (or none at all) appears in another, even though both occurrences refer to the same individual. This happens both across and within repositories.

  • <persname>Moore, Charles Willard, 1925-1993</persname>
  • <persname>Moore, Charles Willard, 1925-1992</persname>
  • <persname>Lipscomb, Mance</persname>
  • <persname>Lipscomb, Mance, 1895-1976<persname>

 

Encoding levels

EAD2002 allows <controlaccess> terms to be nested within a main <controlaccess> heading. Repositories sometimes include <controlaccess> elements within this top level, sometimes one level down, and sometimes at both levels. When extracting TARO’s 153,000 index terms, BaseX queries thus had to be performed at two levels. This could cause unnecessary problems for a script that attempts to cull all <controlaccess> instances for display during search and retrieval.

  • <controlaccess><head>Index Terms</head><corpname>Daughters of the Republic of Texas.</corpname></controlaccess>
  • <controlaccess><head>Index Terms</head><controlaccess><head>Organizations:</head <corpname>Daughters of the Republic of Texas.</corpname></controlaccess></controlaccess>

Leave a Reply