OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] SGML default attributes.

I think you're missing my point: it's not that DTDs aren't useful for
authoring and other things, of course they are, and if you use a DTD to
guide your authoring then everything is good (as long as you're using the
correct set of declarations).

What I'm talking about is the processing context where you get documents
from somewhere and need to determine:

1. What is the (abstract) document type this document claims to conform to?
2. Is the document valid against that document type?

It is this use case that DTDs, by themselves, are not sufficient for, for
the simple reason that the reference in the DOCTYPE declaration to some
external DTD subset is not a reliable indicator, for all the reasons I've
given. You can impose rules in your systems and authoring practices and
intake Q/A processes to make it be the case that the DTD subset serves as
the abstract document type identifier, but you have to impose that--it is
not an inherent property of external subset references.

It is absolutely a fact today that many groups will use a public ID for
some standard DTD, e.g., DocBook, and then map that public ID to some set
of declarations that they have modified in all kinds of ways. This happens
*all the time* and it is *wrong*. It is a lie.

My point is that the reference to the external DTD subset is not itself
sufficiently reliable, in the general, to tell you what the document type

We have all built, and continue to build, systems, that depend on the
DOCTYPE reference (or schema reference or RELAX NG grammar reference) to
know what the abstract document type of a given document is and all those
systems are broken. Every last one.

DITA addresses the breakage by saying "I don't care what actual DTD file
you use (or don't use), I care about what you claim about the vocabulary
used by the document."

One of the reasons that a lot of older systems find it difficult to
support DITA is because those systems were built on the assumption that
there was a useful and necessary relationship between DTD external subsets
and document types (meaning the body of configuration used to manage
documents of that type in whatever the system was). That assumption *WAS

Because DITA rejects that assumption these legacy systems have a hard time
working with DITA documents.

It's not the fault of DTDs, per se, it's the fault of a body of practice
that overemphasized the value of the DTD *reference*.

As we've all said--at the time (1986) it was the best we had and it's
understandable that the conceptual mistake was made. But the problem was
(or should have been) glaringly obvious by the time we did XML--after all
we recognized in XML that documents can be perfectly useful with no
explicit grammar association whatsoever. We also struggled with the issue
in the context of making sense of namespaces.

And let me be clear: I'm not saying DTDs are not useful, especially for
guiding authoring and doing basic document validation. I'm just saying
that DTDs do not define, cannot define, should not be taken to define
document types and DOCTYPE declarations *do not name* document types.

DITA is the first XML standard I know of that tries to provide a
well-defined way to identify the abstract document type. We, as a
community, could have a more general way of doing it if we took the time
to define it. 

But my point is that there are alternatives and DITA stands as an
existence proof of one such.



Eliot Kimber, Owner
Contrext, LLC

On 5/4/16, 4:14 PM, "Peter Flynn" <peter@silmaril.ie> wrote:

>On 05/04/2016 04:23 PM, Eliot Kimber wrote:
>> The conceptual mistakes that SGML made was conflating the notion of an
>> abstract "document type" with the grammar definition for (partially)
>> validating documents against that document type. That is, SGML saw the
>> as being equal to the definition of the "document type" as an
>> But of course that is nonsense.
>I think you're being unnecessarily harsh here. A DTD only declares that
>the abstraction "document type X" shall consist of {some specified
>intermixture of element types with their attributes etc}.
>If we (those of us doing this stuff at the time) saw fit to assume for
>practical purposes that a given DTD *was the* definition of "document
>type X" then that was sufficient to get our software cranked into action.
>To say that SGML was in some way to blame for this may be mathematically
>correct, but fails to take into account that a lot of this was
>relatively new at the time (not that markup was anything new, but having
>an ISO standard and a shedload of software *was*).
>> There was (remains today) the misguided
>> notion that a reference to an external DTD subset somehow told you
>> something actionable about the document you had.
>This is correct post hoc (modulo default attributes :-) But reference to
>an external DTD subset does tell me lots of actionable things about the
>document that I am sitting down to create: the names and valid placement
>of element types being not the least of these.
>> Consider this SGML document:
>> <!DOCTYPE notdocbook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" [
>>   <!ELEMENT notdocbook ANY >
>>   <!ELEMENT bogus ANY >
>> ]>
>> <notdocbook>
>>   <bogus><para>This is not a DocBook document</para></bogus>
>> </notdocbook>
>> This document will be taken as a DocBook document by any tool that
>> the public ID means something. But obviously it is not a DocBook
>> It is, however, 100% DTD valid. QED DTDs are useless as tools of
>> type definition.
>Please don't fall into this trap as I don't think it advances the
>argument. All elephants are grey. Mice are grey. Therefore all mice are
>elephants. DTDs are only useless as tools of document type definition if
>you insist on deliberately misleading the parser (and the user).
>> The only reason the SGML (and now XML world) didn't
>> collapse under this fact is that the vast majority of SGML and XML
>> authoring and management tools simply refused to preserve internal
>This was at the core of our decision to use Emacs and psgml-mode. It has
>many flaws, but What You Write Is What You Get.
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS