Use DTDs!

Hi Folks,

Heavy-duty validation is not always needed. Sometimes all that is needed is to verify that XML instances are using the right set of tags.

Let me state it stronger: I have observed that “verifying that XML instances are using the right set of tags” is how most developers view XML validation. Their XML Schemas are merely thinly veiled versions of DTDs. Developers opt to perform the heavy duty data checking in Java code and/or in a database.

For many (most?) situations use DTDs, not XML Schemas. Here’s why:

1. Less tools needed: Only one tool is needed – a validating XML processor. Conversely, if you use XML Schemas for validation, you need two tools – an XML processor plus an XML Schema validator. The less tools needed, the better.

2. Less to read: If you stick to DTDs, you only have to read the XML specification, which is about 36 pages long. If you use XML Schemas, you have to read XML Schema Part 1, which is around 350 pages and XML Schema Part 2, which is around 100 pages.

3. Less complexity: DTDs are several orders of magnitude simpler than XML Schemas.

4. Less verbosity: The DTD syntax is streamlined and efficient (kind of analogous to XPath in terms of being streamlined and efficient). The XML Schema syntax, on the other hand, is bloated and inefficient.

5. Robust validating tools: The capability of validating against a DTD has been around a long time, the tools are rock-solid. Comparatively, the capability of validating against an XML Schema has been around a short time, the tools are less rock-solid.

6. Inexpensive: Validating XML processors are either free or inexpensive. True, there are some free XML Schema validators, but some of the most popular XML Schema validators are quite pricy.

7. Suited to Architectural Forms: [Norman Gray wrote:] a zeroth-order description of Architectural Forms is that they were a transformation of a document, specified within one or other DTD rather than in a separate transformation language. This did not make for luminous syntax. No. But it was I think the conceptually Right Place for this transformation, it meant that that transformation could be implemented efficiently within the parser, and

it meant that it was natural to conceive of the transformation as 'pulling' the AF instance document out of the DTD instance document, which is a Really Useful Idea.

8. Infoset happiness: [John Cowan wrote:] There are a number of XML DTD features which affect the infoset returned by a compliant parser. If they are in the internal subset, the parser MUST respect them; if they are in the external subset, then any parser that reads the external subset likewise MUST respect them.

Example: Validate that the following XML instance document uses this set of tags: aircraft is the root element, within it are model (which contains data) and altitude (which contains data and has an attribute, units, whose value is feet or meters). The following DTD is very adequate.

DTDs – live long and prosper!

/Roger

aircraft.xml

-----------------------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE aircraft PUBLIC "-//example//aircraft//EN" "aircraft.dtd" >
<aircraft>
<model>Boeing 747</model>
<altitude units="feet">12000</altitude>
</aircraft>

-----------------------------------------------------------------

aircraft.dtd

-----------------------------------------------------------------

<!ELEMENT aircraft (model, altitude)>

<!ELEMENT model (#PCDATA)>

<!ELEMENT altitude (#PCDATA)>

<!ATTLIST altitude
units (feet|meters) #REQUIRED>