Metadata in Bibliographic Databases

Bibliographic databases, which print equivalents are often called indexes and abstracts, are widely used in all types of libraries and by the research community. A record in such databases usually includes the following descriptive elements: title and statement of responsibility (author, editor, composer, etc.), edition, type of material, publisher/distributor, publication date, place of publication, physical description, series, notes, standard number (ISBN, ISSN, etc.) , and terms of availability (price).

Even though the Library of Congress provides a standard cataloging format, MARC, there is a variety of bibliographic formats in the indexing and abstracting industry. For example, some publishers establish controlled vocabularies for subject access, some do not; some distinguish content contributors, such as editors, translators, and compilers, others put them together with the author information. The same concept is often expressed in different terms: what one database calls "journal-article" document type, another may reference as "serial publication." The syntax and sequence of fields also vary dramatically from one source to another. When users search differently formatted databases together, the issue of cross-functionality becomes extremely important. Successful searches are impossible without the effort to bring various formats together. This is where metadata can play a significant role.

The descriptive nature of a bibliographic record makes it metadata by itself. And yet, as mentioned above, the variety of existing formats call for a need for another layer of description, something that would create a defined list of categories to fit each piece of information in a standard way. If the same term, such as SUBJECTS, is used to describe certain type of data across different databases, the task of the retrieval software to identify related objects is drastically easier. The list of elements need to address the most important needs of users, and, first of all, to include items that are likely to be used for linking, look-up index searches, and lateral searches. The examples of such elements are journal names, citation information (volume, issue, pages, date), author last names, etc.

Almost all cataloging standards pre-date the World Wide Web. MARC was invented in the 1960's and over the years was used through the library terminals. Since 1995 the Library of Congress had been interested to apply SGML to MARC data. The alpha version of MARC DTD was first made available in 1996 after more than a year long LC project, which involved many outside vendors and consulting agencies. The working group made any possible effort to bridge the gap between MARC and SGML structures, and, as a result, only two DTD's correspond to five MARC formats as shown below:

MAIN DTD AUTHORITY DTD
USMARC Format for Bibliographic Data USMARC Format for Authority Data
USMARC Format for Community Information USMARC Format for Classification Data
USMARC Format for Holdings Data

It took the Library of Congress another year to start working on the MARC-to-SGML conversion utilities, and this is still an ongoing project.