Standards:Metadata

From THO

Jump to: navigation, search

THO Standards for Cultural Heritage Digitization Projects

Metadata Standards

Introduction

Metadata consists of textual information about digitized resources. Metadata is most closely associated with bibliographic description in support of the search, retrieval, and identification of resources, including both physical and digital objects. Excellent resources for metadata come from the Getty Institute's Introduction to Metadata: Pathways to Digital Information, by Tony Gill, Anne J. Gilliland, and Mary S. Woodley, edited by Murtha Baca, and the NISO publication Understanding Metadata.

Metadata is actually composed of three elements. The first is a content standard, which sets out rules or guidelines for cataloging; AACR2, is one such standard, most frequently used in libraries. Cataloging Cultural Objects (CCO) is a new standard for museums, and Describing Archives: A Content Standard (DA:CS), is a standard for archival description. The second element is a syntax, which establishes out the specific metadata elements that are available; Dublin Core, MODS, and MARC are all syntaxes. The third element is format, which may include XML and ASCII. Format is important because specific software may be required to read a given metadata file.

Because metadata is often specific to the type of original resource being described, there is no one standard or syntax that best describes every type of resource. Crosswalks have been developed to map metadata from one syntax to another; one such crosswalk is described in the Getty Institute's Introduction to Metadata. Participants are encouraged to identify the type of metadata best suited to their collections and to describe the objects in their collections as fully as possible using whatever metadata syntax they have selected.

Metadata can divided into various types: descriptive, used to provide information about the content, subject, or composition of the object, particularly in order to provide resource identification and discovery; structural, used to describe how parts of a complex object relate to each other; and administrative, such as that recorded as part of the digital object's lifecycle of creation, acquisition, use, preservation, and (perhaps ultimately) deletion. Some metadata specialists add additional types, such as preservation metadata (see the section in these standards on Preservation) or technical metadata, which is often created automatically by the device used to create a digital object.

Metadata should be developed regardless of the search or browse mechanism planned to provide access to the digital objects in a collection. However, participants should be aware that certain types of metadata provide more search functionality than others and that the digital asset management system selected to store the metadata (and possibly the digital objects themselves) will also expand or restrict search functionality. For more information on search functionality, participants should read the "Interoperability" section of this document.

Levels of Metadata

THO recognizes three levels of metadata:

Minimal

Participants will provide access to metadata about digital objects in their collection. The simplest form of metadata consists of simple text, sometimes in the form of "keywords," or terms chosen from an uncontrolled vocabulary to describe the resource. "Tags" and captions may also be considered metadata. This form of metadata may be visible to the user or may be embedded in an HTML or other file.

To be searchable, metadata of this type must be indexed, or "spidered," as for example is done by Google(TM) and other search engines. Participants are strongly encouraged to allow this type of indexing, although THO at present has no plans to implement an indexing strategy. For this reason, collections meeting only minimal metadata standards will by default be excluded from the THO search portal.

Basic

Participants will provide descriptive metadata for the digital objects in their collection at a sufficient level of granularity to distinguish individual objects using a metadata standard appropriate to their collection type; often, the choice of metadata may also be driven by the choice of a digital asset management system such as a library catalog or database. Some examples of appropriate metadata syntaxes include MARC, Dublin Core (including variants such as CDP, Western States, and UNTL, or more generally the DC- Library Application Profile), TEI or EAD headers, and the Content Standard for Digital Geospatial Metadata (CSDGM), but this list is not meant to be exclusionary.

Enhanced

In addition to descriptive metadata, participants will provide administrative metadata for all of the digital objects in their collection. Structural, technical, and preservation metadata should also be included whenever possible. Certain metadata syntaxes, particularly METS but also to a lesser extent MODS, Qualified Dublin Core, TEI, and EAD, allow the provision of these additional metadata types. Participants should regularly maintain and update their metadata as new guidelines and standards are established.

References

Personal tools