Standards:Metadata
From THO
THO Standards for Cultural Heritage Digitization Projects
Metadata Standards
Introduction
Metadata consists of textual information about digitized resources. Metadata is most closely associated with bibliographic description in support of the search, retrieval, and identification of resources, including both physical and digital objects. Excellent resources for metadata come from the Getty Institute's Introduction to Metadata: Pathways to Digital Information, by Tony Gill, Anne J. Gilliland, and Mary S. Woodley, edited by Murtha Baca, and the NISO publication Understanding Metadata.
Metadata is actually composed of three elements. The first is a content standard, which sets out rules or guidelines for cataloging; AACR2, is one such standard, most frequently used in libraries. Cataloging Cultural Objects (CCO) is a new standard for museums, and Describing Archives: A Content Standard (DA:CS), is a standard for archival description. The second element is a syntax, which establishes out the specific metadata elements that are available; Dublin Core, MODS, and MARC are all syntaxes. The third element is format, which may include XML and ASCII. Format is important because specific software may be required to read a given metadata file.
Because metadata is often specific to the type of original resource being described, there is no one standard or syntax that best describes every type of resource. Crosswalks have been developed to map metadata from one syntax to another; one such crosswalk is described in the Getty Institute's Introduction to Metadata. Participants are encouraged to identify the type of metadata best suited to their collections and to describe the objects in their collections as fully as possible using whatever metadata syntax they have selected.
Metadata can divided into various types: descriptive, used to provide information about the content, subject, or composition of the object, particularly in order to provide resource identification and discovery; structural, used to describe how parts of a complex object relate to each other; and administrative, such as that recorded as part of the digital object's lifecycle of creation, acquisition, use, preservation, and (perhaps ultimately) deletion. Some metadata specialists add additional types, such as preservation metadata (see the section in these standards on Preservation) or technical metadata, which is often created automatically by the device used to create a digital object.
Metadata should be developed regardless of the search or browse mechanism planned to provide access to the digital objects in a collection. However, participants should be aware that certain types of metadata provide more search functionality than others and that the digital asset management system selected to store the metadata (and possibly the digital objects themselves) will also expand or restrict search functionality. For more information on search functionality, participants should read the "Interoperability" section of this document.
Levels of Metadata
THO recognizes three levels of metadata:
Minimal
Participants will provide access to metadata about digital objects in their collection. The simplest form of metadata consists of simple text, sometimes in the form of "keywords," or terms chosen from an uncontrolled vocabulary to describe the resource. "Tags" and captions may also be considered metadata. This form of metadata may be visible to the user or may be embedded in an HTML or other file.
To be searchable, metadata of this type must be indexed, or "spidered," as for example is done by Google(TM) and other search engines. Participants are strongly encouraged to allow this type of indexing, although THO at present has no plans to implement an indexing strategy. For this reason, collections meeting only minimal metadata standards will by default be excluded from the THO search portal.
Basic
Participants will provide descriptive metadata for the digital objects in their collection at a sufficient level of granularity to distinguish individual objects using a metadata standard appropriate to their collection type; often, the choice of metadata may also be driven by the choice of a digital asset management system such as a library catalog or database. Some examples of appropriate metadata syntaxes include MARC, Dublin Core (including variants such as CDP, Western States, and UNTL, or more generally the DC- Library Application Profile), TEI or EAD headers, and the Content Standard for Digital Geospatial Metadata (CSDGM), but this list is not meant to be exclusionary.
Enhanced
In addition to descriptive metadata, participants will provide administrative metadata for all of the digital objects in their collection. Structural, technical, and preservation metadata should also be included whenever possible. Certain metadata syntaxes, particularly METS but also to a lesser extent MODS, Qualified Dublin Core, TEI, and EAD, allow the provision of these additional metadata types. Participants should regularly maintain and update their metadata as new guidelines and standards are established.
References
- Baca, Murtha (ed.), Patricia Harpring, Elisa Lanzi, Linda McRae, and Ann Whiteside. 2006. Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images. Chicago : American Library Association, 2006
- Baca, Murtha (ed.), Tony Gill, Anne J. Gilliland, and Mary S. Woodley. 2000. Introduction to Metadata: Pathways to Digital Information. Online Edition, Version 2.1. Available at http://www.getty.edu/research/conducting_research/standards/intrometadata/
- CDP Metadata Working Group. 2005 September. Dublin Core Metadata Best Practices, Version 2.1. Available at from http://www.cdpheritage.org/cdp/documents/CDPDCMBP.pdf
- Dublin Core Metadata Initiative, National Information Standards Organization, and American National Standards Institute. 2001. The Dublin Core Metadata Element Set. Available at http://www.niso.org/standards/resources/Z39-85.pdf
- DCMI Usage Board. 2004. DCMI Metadata Terms. Available at from http://www.dublincore.org/documents/dcmi-terms/
- DCMI-Libraries Working Group. 2004. DC-Library Application Profile (DC-Lib). Available at http://dublincore.org/documents/library-application-profile/
- Federal Geographic Data Committee. 1998. Content Standard for Digital Geospatial Metadata. Version 2.0. Available at from http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/
- Library of Congress. 2005. Encoded Archival Description (EAD): Official EAD 2002 Web Site. Available at from http://www.loc.gov/ead/
- Library of Congress. 1999. MARC 21 Format for Bibliographic Data: Including Guidelines for Content Designation. 2 vols. Washington, D.C.: Library of Congress, Cataloging Distribution Service.
- Library of Congress. 2004. METS: Metadata Encoding & Transmission Standard Official Web Site. Available at from http://www.loc.gov/standards/mets/
- Library of Congress. 2004. MODS User Guidelines (Ver. 3.0). Available at from http://www.loc.gov/standards/mods/v3/mods-userguide.html
- NISO. 2004. Understanding Metadata. Bethesda, MD: NISO Press. Available at from http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
- OCLC/RLG Working Group on Preservation Metadata. 2002. Preservation OAIS Information Model: A Metadata Framework to Support the Preservation Objects. Available at http://www.oclc.org/research/projects/pmwg/pm_framework.pdf
- Society of American Archivists. 2004. Describing Archives: A Content Standard. Chicago : Society of American Archivists.
- Text Encoding Initiative Consortium. 2004. TEI P4: Guidelines for Electronic Text Encoding and Interchange, XML-compatible edition. Available at from http://www.tei-c.org/P4X/
- University of North Texas Libraries. 2005 November. The UNTL Metadata Guidelines. Available at http://www.library.unt.edu/digitalprojects/metadata/UNTL-Metadata-Guide.pdf