ALCTS logo Committee on Cataloging: Description and Access
Task Force on Metadata and the Cataloging Rules


Dublin Core Metadata and the Cataloging Rules



Contents:
Metadata and Cataloging [for the Introduction]

Dublin Core Metadata
Background
Dublin Core Metadata’s Support of the Four User Tasks
Dublin Core Elements as Sources of Cataloging Data
Rules in Chapter 9 of AACR2


Metadata and Cataloging

[for the Introduction to the Task Force Report]

In assessing the relationship between metadata and cataloging data, we propose to examine the suitability of each to fulfilling various user tasks. These tasks have been formulated as part of an entity-relationship model by an IFLA Working Group and published as part of the document Functional Requirements for Bibliographic Records. [Note: The final text of this document is not yet available, and the following analysis is based on the draft version circulated for world-wide review in May 1996.]

The IFLA model proposes four basic user tasks:

Both cataloging data (bibliographic records) and metadata support all of these user tasks to some extent. However, the Dublin Core in particular places emphasis on resource discovery (primarily the find task, although it has elements of the identify and select task) and retrieval (the obtain task). Its objective is explicitly not to create a complete surrogate for the entity, and therefore the ability to identify and select a particular manifestation through the Dublin Core metadata may be significantly limited.

Cataloging data not only seeks to support all four user tasks (although the obtain task is mostly supported by local information that is not part of international cataloging standards). In addition, cataloging data is optimized to support each task and seeks to maximize the user’s chances of success in their efforts. Standard cataloging principles, rules, and practices have developed over the past century, and contemporary cataloging databases embody high standards of information quality. Considerable effort has been expended by catalogers throughout the world, working cooperatively to promote this quality. In particular,



Dublin Core Metadata

Background

“The Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of Web resources, it has also attracted the attention of formal resource description communities such as museums and libraries.” [Dublin Core Metadata home page – http://purl.org/metadata/dublin_core/]

The Dublin Core (DC) is designed for maximum simplicity and flexibility. It is expected that DC metadata will be provided by the creators or distributors of the resources themselves, perhaps by filling in a form in their authoring program. On the other hand, Dublin Core can be qualified and extended to meet the requirements of a variety of users. It is theoretically possible to encode most of a fully standard AACR2 description in DC metadata elements, and it is anticipated that some metadata creators will do exactly that. However, the Dublin Core is directed at a broader and less exacting set of resource producers, and the content of a typical set of DC metadata is likely to be less full and less rigorous in its content.

The principles governing use of the Dublin Core elements are simple and straightforward. All elements are optional; all elements are repeatable; order of elements is optional; and, all elements can be qualified by  language (language of the metadata) or scheme (authority or standard for the content).

Our object is to evaluate Dublin Core metadata as a source of cataloging data for records based on the Anglo-American cataloguing rules. The Task Force recognizes that metadata in general and the Dublin Core in particular have applications other than AACR2-based cataloging records. Indeed, it is arguable that Dublin Core metadata might be applied most effectively in a system designed specifically to support its use, rather than in library cataloging databases. This is one of the questions that this report will explore.

On the other hand, if library cataloging databases are to contain records for Web resources (as it is certain that they will), Dublin Core metadata contains a wealth of information that can be used in those records. We will evaluate the kinds of information that each Dublin Core element may contain and indicate how that information can be used in preparing an AACR2-based cataloging record.

Finally, we will discuss the rules in Chapter 9 of AACR2 and make recommendations about the need for changes to those rules to support the use of metadata as a source of cataloging data.

Note: Much of the argument here will make use of a distinction between metadata as a source of information and metadata as a source of cataloging data. “Source of information” is a technical term in AACR2, referring to a source from which information is transcribed in various elements of the cataloging record. In order to contrast with this technical terminology, we have used “source of cataloging data” to refer to factual data on which various elements in the cataloging record may be based; it is thus a much broader concept which includes not only exact transcription or quotation, but summarization or reformulation of the factual information by a cataloger.


Dublin Core Metadata’s Support of the Four User Tasks

Dublin Core metadata supports the four user tasks set forth in Functional Requirements for Bibliographic Records to varying degrees, but its lack of established rules and procedures governing the content of data elements makes Dublin Core elements less reliable than cataloging data. The explicit simplicity of the element set and the fact that all elements are optional also undermines the reliability of Dublin Core metadata. The following discussion notes the relevance of Dublin Core elements for each of the user tasks.

Both cataloging data and Dublin Core metadata support the four user tasks, although Dublin Core is only designed to support the finding and obtaining of electronic resources. On the other hand, cataloging data is optimized to support all four tasks in ways that cannot be expected of metadata. In particular, the use of controlled vocabularies and the practice of authority control enhances the ability to find, and the principle of transcription and concepts of versioning enhance the ability to identify and select desired resources. Cataloging practices add considerable value to the raw data provided by the resources described in bibliographic records, and this added value is intended to support to ability to find, identify, select and obtain desired resources.

Our cataloging databases are high-quality tools for information retrieval, but they are only as good as the standards that apply. The integrity and consistency of these databases depends on applying the more or less same standards to all records in the database. If a significant portion of the database does not reflect the same level of consistency, the database becomes unreliable. It is therefore damaging to the quality of a cataloging database to include in it records based on Dublin Core metadata unless that metadata was formulated according to cataloging principles and practices. This may be possible for metadata coming from particular projects which have been able to adopt appropriate standards. However, it is not possible for a broad range of Internet resources containing metadata provided by authors or data producers. For such resources, it would be preferable to maintain the metadata-based records in a separate database. The metadata will provide a higher level of accessibility than the Internet itself, but its lack of consistency will not damage the even higher level of quality we have invested so much in providing in our cataloging databases.


Dublin Core Elements as Sources of Cataloging Data

The official definitions of the Dublin Core metadata element set in found in “Description of Dublin Core Elements” – [ http://purl.org/metadata/dublin_core_elements/]. A mapping of DC elements to the USMARC fields is contained in “Dublin Core/MARC/GILS Crosswalk,” prepared by the Network Development and MARC Standards Office at the Library of Congress [ http://lcweb.loc.gov/marc/dccross.html]. The following discussion is based on these sources and discusses the use of DC information in AACR2 cataloging records.

  1. TITLE

    The TITLE element corresponds to the Title Proper (AACR2 9.1B1, USMARC 245$a). The source of information for the Title Proper is the title screen or other eye-readable information. Only when there is no eye-readable information can a title be transcribed from other internal evidence such as metadata in the file header. Therefore, the metadata TITLE will usually need to be compared with the eye-readable title before it can be accepted as the Title Proper. If it is different from the eye-readable title, the metadata TITLE would be recorded as a Variant Title (USMARC 246, with a caption “$iTitle from metadata:”).

  2. CREATOR

    The CREATOR element corresponds to the main and/or added entries. The USMARC Crosswalk maps this element to field 720, field 700 (if a personal name is specified) or field 710 (if a corporate name is specified).

    For an AACR2-based description, the rules in Chapter 21 would need to be applied, and a main entry determined. If the CREATOR (or one of the CREATORS) is determined to be the main entry, a 1XX field would be used. If no name type is specified, the cataloger would have to determine whether the name was a personal or corporate name.

    The content of this element may or may not conform to the rules for form of name in Chapters 22-24, and the name may or may not be consistent with the official form in the national authority file. In order to conform to AACR2 practice, authority work would need to be done. Since the USMARC field by itself does not indicate whether the content of the field is an authorized heading, it is particularly important that authority control procedures be built into any use of CREATOR information in cataloging records.

    It should also me noted that the CREATOR element also corresponds to other AACR2 elements, such as the statement of responsibility (AACR2 9.1F, USMARC 245$c), credits note, etc. Although the DC element is not intended as a descriptive (as opposed to an access) element, the data given in the DC CREATOR element may be very useful in describing the responsibility for creation of the resource. Although it is most likely to be formulated as an access point (e.g., an inverted personal name), it may be transcribed in brackets in the Statement of Responsibility area or in a note.

  3. SUBJECT

    The SUBJECT element may contain various identifiers relating to the subject of the resource, such as keywords or classification notations. The default USMARC mapping is to field 653 (Uncontrolled subject access), although specific fields such as 650 for LC Subject Headings or 050 for LC Classification numbers may be used if the metadata include identification of such subject schemes.

    This element does not involve descriptive cataloging covered by AACR2, but it should be noted that this is not a transcribed element. Therefore, it may be used without further modification. Its usefulness will be determined by the specificity of the scheme identification. In a catalog that uses controlled subject headings and classification, uncontrolled keywords will be less useful than controlled headings and classification.

  4. DESCRIPTION

    The DESCRIPTION element corresponds to a Summary note (AACR2 9.7B17, USMARC 520). As with the SUBJECT element, this is not transcribed data and therefore can be used without modification in a catalog record. The usefulness of the result will depend only on the quality of the abstract or summary.

  5. PUBLISHER

    The PUBLISHER element corresponds to the Name of publisher, distributor, etc. (AACR2 9.4D, USMARC 260$b). The prescribed source for this element in AACR2, like that for the title, gives precedence to eye-readable information, over information in the HTML source code. Therefore, the content of the PUBLISHER element would need to be verified; if there is no eye-readable publisher information, the metadata can be used.

  6. CONTRIBUTOR

    The OTHER CONTRIBUTORS element corresponds to the added entries. All of the points made under the CREATOR element above apply here, including the need for authority work and the use of this information as the basis for statements of responsibility and credits notes.

  7. DATE

    According to the Dublic Core definition, the DATE element contains “a date associated with the creation or availability of the resource. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985.” This corresponds in content (but not in form) to the Date of publication, distribution, etc. (AACR2 9.4F, USMARC 260$c). As with the PUBLISHER element, the prescribed source of information for this element gives priority to eye-readable information. The date would have to be verified and formated according to 9.4F. It should also be noted that the DC DATE element is not necessarily a date of publication; “creation or availability” can cover a multitude of sins, particularly when applied by non-catalogers.

    Other dates may also be recorded in this DC element, such as the date of last update (which might need to be included in a “Description based on:” note). Since Dublin Core includes little information that explicitly addresses the distinction among versions of the same resource, this element may be the only source of such data, and the information may be decidedly inadequate for this purpose.

  8. RESOURCE TYPE

    According to the Dublin Core definition, the TYPE element contains “the category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types.” The USMARC Crosswalk maps this element to a Form/Genre term (USMARC field 655).

    It should be noted that this data is relevant to several AACR2 elements. It is similar to the Designation element in the File characteristics area (AACR 9.3B1, USMARC 256). If the list of designations in 9.3B1 is expanded as a result of the ISBD(ER) harmonization, it will be important that the DC and AACR2 lists not be in conflict. RESOURCE TYPE data may also be relevant to the note on Nature and scope (AACR2 9.7B1a, USMARC 500).

  9. FORMAT

    According to the Dublin Core definition, the FORMAT element contains “the data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non-electronic media.” The USMARC Crosswalk maps this element to a subfield in field 856 (Electronic location and access).

    In terms of AACR2, this element may contain data that could be included in a note on Nature and scope (AACR2 9.7B1a, USMARC 516) or on System requirements (AACR2 9.7B1b, USMARC 538). The information in the metadata would have to be rephrased when used in a note.

  10. RESOURCE IDENTIFIER

    According to the Dublin Core definition, the RESOURCE IDENTIFIER element contains a “string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.” The USMARC Crosswalk maps this element to the URL (856$u), although other elements can be used if the appropriate scheme is identified (e.g., ISBN in field 020). Although this is vital information about any Web resource, it is not governed by any AACR2 rules (except 9.8B for the ISBN or ISSN).

  11. SOURCE

    The SOURCE element contains information about “the work, either print or electronic, from which this resource is derived.” Although the USMARC Crosswalk maps this to field 786 (Data source), the data is covered by the note on Edition and history (AACR2 9.7B7, USMARC 500 or 533) and, in the case of serial publications, the note on Other formats (AACR2 12.7B16, USMARC 776). The content of the element may need to be modified to comply with the relevant rules and to assure that the related resource is correctly identified.

  12. LANGUAGE

    The LANGUAGE element corresponds to the Language note (AACR 9.7B2, USMARC 546), as well as to the coded language element in USMARC. The default mapping is to field 546, on the grounds that free-text information is most likely, but, if the USMARC coded scheme is identified, it can be mapped to the coded element in field 008. The content in the Language note may have to be modified to conform to the rules.

  13. RELATION

    The RELATION element contains data about the relation of the resource to other resources. This is a more general version of the SOURCE relationship and, like SOURCE, corresponds to the Edition and history and the Relationships with other serials notes. Again, the content of the element may need to be modified to comply with the relevant rules and to assure that the related resource is correctly identified.

  14. COVERAGE

    The COVERAGE element contains data about the chronological or geographic coverage of the resource. The default USMARC mapping is to field 500, but the data may be appropriate in a variety of notes. It may also serve as the basis for subject descriptors. Certain data producers and archivists have developed fairly detailed standards for defining the coverage, particularly of geo-spatial data, and there are specific USMARC fields for such data. Only general coverage notes are specified in AACR2, under the rule for notes on Nature and scope.

  15. RIGHTS MANAGEMENT

    According to the Dublin Core definition, “the content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources.” The USMARC Crosswalk maps this element to field 540 (Terms governing use and reproduction note) for which there is no corresponding rule in AACR2.


Rules in Chapter 9 of AACR2

Rule 9.0B1. The rule currently reads:

9.0B1. Chief source of information. The chief source of information for computer files is the title screen(s).
      If there is no title screen, take the information from other formally presented internal evidence (e.g., main menus, program statements, first display of information, the header to the file including “Subject” lines, information at the end of the file). In case of variation in fullness of information found in these sources, prefer the source with the most complete information.
      If the computer file is unreadable without processing (e.g., compressed file, printer-formatted file), take the information from the file after it has been uncompressed, printed out, or otherwise processed for use.
      If the information required is not available from internal sources, take it from the following sources (in this order of preference)
the physical carrier or its labels
information issued by the publisher, creator, etc., with the file (sometimes called “documentation”)
information printed on the container issued by the publisher, distributor, etc.
      If the item being described consists of two or more separate physical parts, treat a container or its permanently affixed label that is the unifying element as the chief source of information if it furnishes a collective title and the formally presented information in, or the labels on, the parts themselves do not.
      If the information required is not available from the chief source or the sources listed above, take it from the following sources (in this order of preference)
other published descriptions of the file
other sources

It is probably true that metadata falls under “other formally presented internal evidence.” This would mean that metadata could be used as a substitute for the title screen (eye-readable information).

Although it might be worth considering whether metadata might be given preference over the eye-readable information, it is probably unwise to compromise the principle of transcription and to prefer a hidden source to the public source provided by the eye-readable content of the file. Therefore, we do not recommend changing the first paragraph of 9.0B1.

On the other hand, the significance of metadata in the cataloging world probably warrants adding “metadata” to the list of “other formally presented internal evidence” in the 2nd paragraph.

Other rules: {We might want to propose adding some examples, particularly in 9.7B. Are there any other rules that we ought to look at?}



Back to: Task Force home page