OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: external parsed entites (was: A unique ID question ?)

[ Lists Home | Date Index | Thread Index ]
  • From: "W. Eliot Kimber" <eliot@isogen.com>
  • To: xml-dev@ic.ac.uk
  • Date: Mon, 08 Nov 1999 08:47:57 -0600

"Steven R. Newcomb" wrote:
> [Eliot Kimber:]
> > [...snip...] external parsed entities ("external text entities" in
> > SGML parlance) are evil and should never have been included in XML.


> The trick is for people to realize that the use of parsed external
> entities should be avoided unless their use is purely for storage
> convenience.  There should never be a semantic load on the fact that
> some data is stored in a particular entity.  Regrettably, most of the
> times when parsed external entities have been used, they have been
> used for impure, semantically loaded purposes (such as the semantic of
> re-usability).  So maybe as a matter of good public education policy,
> Eliot's position that "external parsed entities are evil" is the right
> approach, as long as we're not going to get rid of XML's ability to
> support them.

The reason I think that external parsed entities are evil is precisely
because they encourage people to do this stupid thing *which is almost
never the right thing to do*. People innocently and understandably use
them to solve immediate practical problems, encouraged by tools that
support only this form of data organization (at least out of the
box--most, if not all, can be taught to do the right thing), only to
wake up one day and discover they have an unworkable system that will
now require huge effort to rework to meet their requirements.  To my
mind, this is the same sort of evil as Microsoft Word: an immdiately
convenient and apparently useful facility that is available everywhere
that you can use without thinking, only to realize later that you're
screwed eight ways from Sunday for having used it without thinking about
it carefully first.

Or maybe it needs to be a social presure thing: "friends don't let
friends use external parsed entities for re-use."

The *ONLY TIME* external parsed entities are the right thing to do is
when exactly one actor is working on exactly one doucment and needs to
partition it into separate files for *their own* convenience. *AT ALL
OTHER TIMES* external parsed entities are not the right thing to use.

I think it would have been much better to force all non-trivial XML
applications to step up to the reality that almost all documents are
compound documents semantically (but not syntactically) composed of
separate XML documents. If SGML or XML had done this, we wouldn't be
having this discussion now because the problem wouldn't arise.

Note that there is the inverse requirement as well: the need to
*syntactically combine* structures that are *semantically* separate
documents. This is the problem that name spaces try to solve (but fail
to solve, for reasons that should be obvious and that I will not repeat
having expounded on them at length in the past). The solution has often
be characterized as "inline subdocs". I am convinced that without this
feature, XML and SGML are inherently limited in a serious way (as
evidenced by the sadly misguided excitement over name spaces, which
demonstrates a serious unmet requirement). [Fortunately, the
infrastructure you need to have in place to do re-use can also support
this form of document, so we can survive without this feature although
the solution is suboptimal and not very satisfying.]

Essentially, we need to have a markup and processing infrastructure that
decouples the storage structure of documents from their logical
structure so that we can have it both ways: one storage object with
multiple documents or one document distributed across multiple storage
objects, as well as true reusability.

Lest people think that I'm just talking: I have contributed to the
development of a very large scale SGML/XML-based system now in
production that is entirely based on the non-use of external parsed
entities. I am currently helping to implement another such system for
the State of Texas. Everything you need can be done with existing tools
and technology with very little extra effort *if you do it up front*.

My take is that there is no excuse for not doing it right. It's not like
the tool developers don't understand the issue nor are they unware of
the existing standards and implementing technology (I know because I've
talked to all the editor and database vendors personally at some length
about this).



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS