OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] entity definitions

[ Lists Home | Date Index | Thread Index ]

On Tue, 27 Apr 2004 06:36:20 -0400
Elliotte Rusty Harold <elharo@metalab.unc.edu> wrote:

> At 11:55 PM -0400 4/26/04, Amelia A Lewis wrote:
> >Problem: schema languages other than DTD are increasingly popular, for
> >very good reasons, offering facilities not available or difficult of
> >attainment in DTDs.  However, without a DTD (if only an internal
> >subset), entities may not be used.
> There's a faulty assumption here that a document can have only one 
> schema ort can use only one schema language. That is not true. It is 

It also is not the assumption in the above paragraph, thanks.  The para
says "without a DTD".  It says that validation is possible without a DTD,
but that entity definition is not.

> to validate. In fact, I wish entity definition and validation had not 
> been conflated in XML 1.0. They are two very different tasks. 

I strongly agree.

> However, now that people are moving to RELAX NG, it seems possible 
> that much simpler DTDs that merely serve to define entities will 
> become prevalent, and we will effectively have two nice, cleanly 
> separated mechanisms, one for entity definition and one for 
> validation, each of which perform their specified task well.

Oh, barf.  Sorry.  I don't like DTD syntax.  At all.

> If you wish to argue that a new entity definition mechanism is 
> necessary, you're going to need to do better than noting that a DTD 
> is not preferred for validation. You need to demonstrate that it is 
> flawed from the perspective of entity definition, irrespective of and 
> orthogonal to any issues involved in validation.

I don't know that I can prove that it is "flawed," but I think it is not
too difficult to show that, for some types of entity, XML syntax has
notable ease of use advantages.

Named entities fall into three general categories:

1) character entities (mnemonics for numeric entities)

2) text entities (several examples in the XML 1.0 spec, and in the
Annotated version; one of my favorites is the <a

3) markup entities (for instance, a disclaimer, a copyright statement, or
otherwise "boilerplate")

Although it is useful and conceptually cleaner to separate the pre-defined
entities of a particular vocabulary from its markup definitions, it is
also useful (to the point that new versions are likely to have this as a
requirement) to be able to package them together.  Docbook, XHTML, and
MathML all need to be able to present a set of entities, usually character
entities, as part of the standard vocabulary.  It might be possible to
require users to enter two pieces of information where they have entered
one before (both a DTD and a schema indicator, that is), but it isn't
likely to be popular.

For this application, there's no particular gain in using an XML format,
except that it becomes possible to associate entity definitions with
schemas (that is, with schemas not written via DTD), without requiring the
end user to make both associations.

For the text entity application, there's no particularly good reason to
use a different format than DTD.  Whatever format is used, some form of
import will be necessary (an external entity in a DTD versus an external
entity in a processing instruction in the prologue, same difference).  It
is possible, but not particularly easy to argue, that using a consistent
XML syntax for all entity definitions is a better thing than requiring a
different syntax, which is then easily confused linguistically with
parameter entities.

For the markup entity application, there is a much stronger case to be
made; it is awkward and ugly to define markup entities as internal
entities.  As external entities, they can be wrapped in an <entity>
element, enforcing well-balanced-ness by simply parsing the file (using
EDML, that is; presumably an alternative XML syntax would provide similar
capability).  Internal entities (and external entities declared with an
entity wrapper) easily guarantee well-formedness.  The "no recursion" rule
becomes a simple consequence of the Entity Declared rule (although this
makes it a validity error, not a well-formedness error).

On the other hand, this application is the domain of XInclude.  I've
rather fallen out of love with XInclude, as I think the entity mechanism
is more flexible, powerful, and useful (and I think XPointer has become so
hideously burdensome to use that it drags any spec that requires it to the
bottom of the sea in concrete overshoes, but that's perhaps another case
of gratuitous offensiveness, like that for which I called the Core WG
'stupid').  Since, unlike text content, markup content cannot be used in
attributes, it might make sense to use this XML-format inclusion mechanism

It's certainly possible to use DTDs to define entities; it's been done for
years.  I *can't* show that it's flawed; it's known-working (however, you
might want to speak with David Carlisle on the topic ... *laugh*).

I can, perhaps, show that it is more comfortable and convenient for users
who are *not* using DTDs for validation to also *not* use them for entity
declaration (and *that's* why I tie them together, not because I think
validation and entity declaration are ineffably and ethereally tied).  *I*
don't want to use DTD for entity declaration (or I wouldn't've written up
the EDML nonsense, would I?).  But perhaps, like the suitcase that my
mother gave me going off to college (it's now fifty years old, I think),
DTD (for entity declaration) is just baggage I'm stuck with ....

Amelia A. Lewis                    amyzing {at} talsever.com
Did you exchange a walk-on part in the war for the lead role in a cage?
                -- Pink Floyd


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS