Committee on Cataloging: Description and Access Task Force on Metadata and the Cataloging Rules |
In assessing the relationship between metadata and cataloging data, we propose to examine the suitability of each to fulfilling various user tasks. These tasks have been formulated as part of an entity-relationship model by an IFLA Working Group and published as part of the document Functional Requirements for Bibliographic Records. [Note: The final text of this document is not yet available, and the following analysis is based on the draft version circulated for world-wide review in May 1996.]
The IFLA model proposes four basic user tasks:
Both cataloging data (bibliographic records) and metadata support all of these user tasks to some extent. However, the Dublin Core in particular places emphasis on resource discovery (primarily the find task, although it has elements of the identify and select task) and retrieval (the obtain task). Its objective is explicitly not to create a complete surrogate for the entity, and therefore the ability to identify and select a particular manifestation through the Dublin Core metadata may be significantly limited.
Cataloging data not only seeks to support all four user tasks (although the obtain task is mostly supported by local information that is not part of international cataloging standards). In addition, cataloging data is optimized to support each task and seeks to maximize the users chances of success in their efforts. Standard cataloging principles, rules, and practices have developed over the past century, and contemporary cataloging databases embody high standards of information quality. Considerable effort has been expended by catalogers throughout the world, working cooperatively to promote this quality. In particular,
The Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of Web resources, it has also attracted the attention of formal resource description communities such as museums and libraries. [Dublin Core Metadata home page http://purl.org/metadata/dublin_core/]
The Dublin Core (DC) is designed for maximum simplicity and flexibility. It is expected that DC metadata will be provided by the creators or distributors of the resources themselves, perhaps by filling in a form in their authoring program. On the other hand, Dublin Core can be qualified and extended to meet the requirements of a variety of users. It is theoretically possible to encode most of a fully standard AACR2 description in DC metadata elements, and it is anticipated that some metadata creators will do exactly that. However, the Dublin Core is directed at a broader and less exacting set of resource producers, and the content of a typical set of DC metadata is likely to be less full and less rigorous in its content.
The principles governing use of the Dublin Core elements are simple and straightforward. All elements are optional; all elements are repeatable; order of elements is optional; and, all elements can be qualified by language (language of the metadata) or scheme (authority or standard for the content).
Our object is to evaluate Dublin Core metadata as a source of cataloging data for records based on the Anglo-American cataloguing rules. The Task Force recognizes that metadata in general and the Dublin Core in particular have applications other than AACR2-based cataloging records. Indeed, it is arguable that Dublin Core metadata might be applied most effectively in a system designed specifically to support its use, rather than in library cataloging databases. This is one of the questions that this report will explore.
On the other hand, if library cataloging databases are to contain records for Web resources (as it is certain that they will), Dublin Core metadata contains a wealth of information that can be used in those records. We will evaluate the kinds of information that each Dublin Core element may contain and indicate how that information can be used in preparing an AACR2-based cataloging record.
Finally, we will discuss the rules in Chapter 9 of AACR2 and make recommendations about the need for changes to those rules to support the use of metadata as a source of cataloging data.
Note: Much of the argument here will make use of a distinction between metadata as a source of information and metadata as a source of cataloging data. Source of information is a technical term in AACR2, referring to a source from which information is transcribed in various elements of the cataloging record. In order to contrast with this technical terminology, we have used source of cataloging data to refer to factual data on which various elements in the cataloging record may be based; it is thus a much broader concept which includes not only exact transcription or quotation, but summarization or reformulation of the factual information by a cataloger.
Dublin Core Metadatas Support of the Four User Tasks
Dublin Core metadata supports the four user tasks set forth in Functional Requirements for Bibliographic Records to varying degrees, but its lack of established rules and procedures governing the content of data elements makes Dublin Core elements less reliable than cataloging data. The explicit simplicity of the element set and the fact that all elements are optional also undermines the reliability of Dublin Core metadata. The following discussion notes the relevance of Dublin Core elements for each of the user tasks.
Although there are only limited requirements about the content of these elements, the content may be optimized in the same manner as cataloging data. For example, the content of the metadata elements may be literally identical with the same information shown in eye-readable form on the resource. And controlled vocabularies and authority control practices may be applied to the content of name (CREATOR, CONTRIBUTOR) and SUBJECT elements. According to the Dublin Core element description, To promote global interoperability, a number of the element descriptions suggest a controlled vocabulary for the respective element values. However, this is not a requirement, and the original intent of the Dublin Core to capture information supplied by the authors or distributors of electronic resources will probably apply to some extent to most random collections of metadata. Unless the metadata is created as part of a project that is able to impose its own rules for content, only minimal assumptions about the reliability of the data can be made.
Both cataloging data and Dublin Core metadata support the four user tasks, although Dublin Core is only designed to support the finding and obtaining of electronic resources. On the other hand, cataloging data is optimized to support all four tasks in ways that cannot be expected of metadata. In particular, the use of controlled vocabularies and the practice of authority control enhances the ability to find, and the principle of transcription and concepts of versioning enhance the ability to identify and select desired resources. Cataloging practices add considerable value to the raw data provided by the resources described in bibliographic records, and this added value is intended to support to ability to find, identify, select and obtain desired resources.
Our cataloging databases are high-quality tools for information retrieval, but they are only as good as the standards that apply. The integrity and consistency of these databases depends on applying the more or less same standards to all records in the database. If a significant portion of the database does not reflect the same level of consistency, the database becomes unreliable. It is therefore damaging to the quality of a cataloging database to include in it records based on Dublin Core metadata unless that metadata was formulated according to cataloging principles and practices. This may be possible for metadata coming from particular projects which have been able to adopt appropriate standards. However, it is not possible for a broad range of Internet resources containing metadata provided by authors or data producers. For such resources, it would be preferable to maintain the metadata-based records in a separate database. The metadata will provide a higher level of accessibility than the Internet itself, but its lack of consistency will not damage the even higher level of quality we have invested so much in providing in our cataloging databases.
Dublin Core Elements as Sources of Cataloging Data
The official definitions of the Dublin Core metadata element set in found in Description of Dublin Core Elements [ http://purl.org/metadata/dublin_core_elements/]. A mapping of DC elements to the USMARC fields is contained in Dublin Core/MARC/GILS Crosswalk, prepared by the Network Development and MARC Standards Office at the Library of Congress [ http://lcweb.loc.gov/marc/dccross.html]. The following discussion is based on these sources and discusses the use of DC information in AACR2 cataloging records.
The TITLE element corresponds to the Title Proper (AACR2 9.1B1, USMARC 245$a). The source of information for the Title Proper is the title screen or other eye-readable information. Only when there is no eye-readable information can a title be transcribed from other internal evidence such as metadata in the file header. Therefore, the metadata TITLE will usually need to be compared with the eye-readable title before it can be accepted as the Title Proper. If it is different from the eye-readable title, the metadata TITLE would be recorded as a Variant Title (USMARC 246, with a caption $iTitle from metadata:).
The CREATOR element corresponds to the main and/or added entries. The USMARC Crosswalk maps this element to field 720, field 700 (if a personal name is specified) or field 710 (if a corporate name is specified).
For an AACR2-based description, the rules in Chapter 21 would need to be applied, and a main entry determined. If the CREATOR (or one of the CREATORS) is determined to be the main entry, a 1XX field would be used. If no name type is specified, the cataloger would have to determine whether the name was a personal or corporate name.
The content of this element may or may not conform to the rules for form of name in Chapters 22-24, and the name may or may not be consistent with the official form in the national authority file. In order to conform to AACR2 practice, authority work would need to be done. Since the USMARC field by itself does not indicate whether the content of the field is an authorized heading, it is particularly important that authority control procedures be built into any use of CREATOR information in cataloging records.
It should also me noted that the CREATOR element also corresponds to other AACR2 elements, such as the statement of responsibility (AACR2 9.1F, USMARC 245$c), credits note, etc. Although the DC element is not intended as a descriptive (as opposed to an access) element, the data given in the DC CREATOR element may be very useful in describing the responsibility for creation of the resource. Although it is most likely to be formulated as an access point (e.g., an inverted personal name), it may be transcribed in brackets in the Statement of Responsibility area or in a note.
The SUBJECT element may contain various identifiers relating to the subject of the resource, such as keywords or classification notations. The default USMARC mapping is to field 653 (Uncontrolled subject access), although specific fields such as 650 for LC Subject Headings or 050 for LC Classification numbers may be used if the metadata include identification of such subject schemes.
This element does not involve descriptive cataloging covered by AACR2, but it should be noted that this is not a transcribed element. Therefore, it may be used without further modification. Its usefulness will be determined by the specificity of the scheme identification. In a catalog that uses controlled subject headings and classification, uncontrolled keywords will be less useful than controlled headings and classification.
The DESCRIPTION element corresponds to a Summary note (AACR2 9.7B17, USMARC 520). As with the SUBJECT element, this is not transcribed data and therefore can be used without modification in a catalog record. The usefulness of the result will depend only on the quality of the abstract or summary.
The PUBLISHER element corresponds to the Name of publisher, distributor, etc. (AACR2 9.4D, USMARC 260$b). The prescribed source for this element in AACR2, like that for the title, gives precedence to eye-readable information, over information in the HTML source code. Therefore, the content of the PUBLISHER element would need to be verified; if there is no eye-readable publisher information, the metadata can be used.
The OTHER CONTRIBUTORS element corresponds to the added entries. All of the points made under the CREATOR element above apply here, including the need for authority work and the use of this information as the basis for statements of responsibility and credits notes.
According to the Dublic Core definition, the DATE element contains a date associated with the creation or availability of the resource. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985. This corresponds in content (but not in form) to the Date of publication, distribution, etc. (AACR2 9.4F, USMARC 260$c). As with the PUBLISHER element, the prescribed source of information for this element gives priority to eye-readable information. The date would have to be verified and formated according to 9.4F. It should also be noted that the DC DATE element is not necessarily a date of publication; creation or availability can cover a multitude of sins, particularly when applied by non-catalogers.
Other dates may also be recorded in this DC element, such as the date of last update (which might need to be included in a Description based on: note). Since Dublin Core includes little information that explicitly addresses the distinction among versions of the same resource, this element may be the only source of such data, and the information may be decidedly inadequate for this purpose.
According to the Dublin Core definition, the TYPE element contains the category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. The USMARC Crosswalk maps this element to a Form/Genre term (USMARC field 655).
It should be noted that this data is relevant to several AACR2 elements. It is similar to the Designation element in the File characteristics area (AACR 9.3B1, USMARC 256). If the list of designations in 9.3B1 is expanded as a result of the ISBD(ER) harmonization, it will be important that the DC and AACR2 lists not be in conflict. RESOURCE TYPE data may also be relevant to the note on Nature and scope (AACR2 9.7B1a, USMARC 500).
According to the Dublin Core definition, the FORMAT element contains the data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non-electronic media. The USMARC Crosswalk maps this element to a subfield in field 856 (Electronic location and access).
In terms of AACR2, this element may contain data that could be included in a note on Nature and scope (AACR2 9.7B1a, USMARC 516) or on System requirements (AACR2 9.7B1b, USMARC 538). The information in the metadata would have to be rephrased when used in a note.
According to the Dublin Core definition, the RESOURCE IDENTIFIER element contains a string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element. The USMARC Crosswalk maps this element to the URL (856$u), although other elements can be used if the appropriate scheme is identified (e.g., ISBN in field 020). Although this is vital information about any Web resource, it is not governed by any AACR2 rules (except 9.8B for the ISBN or ISSN).
The SOURCE element contains information about the work, either print or electronic, from which this resource is derived. Although the USMARC Crosswalk maps this to field 786 (Data source), the data is covered by the note on Edition and history (AACR2 9.7B7, USMARC 500 or 533) and, in the case of serial publications, the note on Other formats (AACR2 12.7B16, USMARC 776). The content of the element may need to be modified to comply with the relevant rules and to assure that the related resource is correctly identified.
The LANGUAGE element corresponds to the Language note (AACR 9.7B2, USMARC 546), as well as to the coded language element in USMARC. The default mapping is to field 546, on the grounds that free-text information is most likely, but, if the USMARC coded scheme is identified, it can be mapped to the coded element in field 008. The content in the Language note may have to be modified to conform to the rules.
The RELATION element contains data about the relation of the resource to other resources. This is a more general version of the SOURCE relationship and, like SOURCE, corresponds to the Edition and history and the Relationships with other serials notes. Again, the content of the element may need to be modified to comply with the relevant rules and to assure that the related resource is correctly identified.
The COVERAGE element contains data about the chronological or geographic coverage of the resource. The default USMARC mapping is to field 500, but the data may be appropriate in a variety of notes. It may also serve as the basis for subject descriptors. Certain data producers and archivists have developed fairly detailed standards for defining the coverage, particularly of geo-spatial data, and there are specific USMARC fields for such data. Only general coverage notes are specified in AACR2, under the rule for notes on Nature and scope.
According to the Dublin Core definition, the content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. The USMARC Crosswalk maps this element to field 540 (Terms governing use and reproduction note) for which there is no corresponding rule in AACR2.
Rule 9.0B1. The rule currently reads:
9.0B1. Chief source of information. The chief source of information for computer files is the title screen(s).
If there is no title screen, take the information from other formally presented internal evidence (e.g., main menus, program statements, first display of information, the header to the file including Subject lines, information at the end of the file). In case of variation in fullness of information found in these sources, prefer the source with the most complete information.
If the computer file is unreadable without processing (e.g., compressed file, printer-formatted file), take the information from the file after it has been uncompressed, printed out, or otherwise processed for use.
If the information required is not available from internal sources, take it from the following sources (in this order of preference)
the physical carrier or its labels
If the item being described consists of two or more separate physical parts, treat a container or its permanently affixed label that is the unifying element as the chief source of information if it furnishes a collective title and the formally presented information in, or the labels on, the parts themselves do not.
information issued by the publisher, creator, etc., with the file (sometimes called documentation)
information printed on the container issued by the publisher, distributor, etc.
If the information required is not available from the chief source or the sources listed above, take it from the following sources (in this order of preference)
other published descriptions of the file
other sources
It is probably true that metadata falls under other formally presented internal evidence. This would mean that metadata could be used as a substitute for the title screen (eye-readable information).
Although it might be worth considering whether metadata might be given preference over the eye-readable information, it is probably unwise to compromise the principle of transcription and to prefer a hidden source to the public source provided by the eye-readable content of the file. Therefore, we do not recommend changing the first paragraph of 9.0B1.
On the other hand, the significance of metadata in the cataloging world probably warrants adding metadata to the list of other formally presented internal evidence in the 2nd paragraph.
Other rules: {We might want to propose adding some examples, particularly in 9.7B. Are there any other rules that we ought to look at?}
Back to: Task Force home page