Topics

Next topics

Step One: Understanding the Document Type Definition

The DTD is the foundation of valid XML documents, providing the definition of a document type, for member documents to follow. The DTD contains the information necessary for writing valid XML documents, as well as for processing them. Without this information, document readers might not know how to process links, images or entities and document authors would lack a template for development.

Before designing a DTD, you must have a clear understanding of the type of document that you are creating. You must choose a document type and name it, such as novel, memo, webpage, letter or report. Once you have chosen your document type, you can start building your DTD, expanding it to fit the requirements of the document.

DTDs embody a small syntax that can be mastered quite quickly. This syntax has several important components but can be summed into two essential structures, which are the element and the attribute. These two structures are used in documents to describe content. Their use in documents must be defined in a DTD to ensure that conforming document are valid. The root element is the most important and contains all other elements. Start with it, defining its contents, then defining the elements that are its contents, until you reach text-level elements. The attribute definition process is not as circular and merely requires the definition of attributes for each element that use attributes. Elements generally take the form: <!ELEMENT NAME CONTENT>. Attributes often take the form: <!ATTLIST ELEMENT-NAME NAME CDATA #IMPLIED>.

Example DTD for the Novel Document Type

This DTD describes the structure of a simple novel. The document type of this DTD is novel, as you can see by its root element, which is defined in the first line. The DTD defines each element, its name and content, and defines each element of that content, until the DTD is fully defined. Elements are the only allowed content of many of the elements in this DTD and cannot contain text directly. The #PCDATA content of some elements means that text or character data is allowed.

<!ELEMENT novel (preface,chapter+,biography?,criticalessay*)>
<!ELEMENT preface (paragraph+)>
<!ELEMENT chapter (title,paragraph+,section+)>
<!ELEMENT section (title,paragraph+)>
<!ELEMENT biography (title,paragraph+)>
<!ELEMENT criticalessay (title,section+)>
<!ELEMENT paragraph (#PCDATA|keyword)*>
<!ELEMENT title (#PCDATA|keyword)*>
<!ELEMENT keyword (#PCDATA)>