OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   external parsed entites (was: A unique ID question ?)

[ Lists Home | Date Index | Thread Index ]
  • From: "Steven R. Newcomb" <srn@techno.com>
  • To: xml-dev@ic.ac.uk
  • Date: Sun, 7 Nov 1999 12:21:37 -0600

[Eliot Kimber:]

> [...snip...] external parsed entities ("external text entities" in
> SGML parlance) are evil and should never have been included in XML.

> The only complete and managable solution to this problem is to
> manage each component as a complete and independent document and
> combine them together *semantically* using something like XLink or
> HyTime's value reference facility (explicit use-by-reference as
> distinct from hyperlinking).

Eliot is right but there is more subtlety to this story than merely
that external parsed entities are evil.  It's true that they've been
used so evilly for so long that they are hard to distinguish from the
problems that their evil uses have caused.  However, the external
parsed entity feature does have a good and beneficial purpose, which
is to make the storage structure (the way the data have been allocated
to physical containers such as computer files) of a whole document
(the bunch of stuff to be concatenated as input to the parsing
process) independent of its logical structure (as a tree of elements).
The theory is still valid: there is an element (logical) tree, and an
entity (physical storage) tree, and that in SGML or XML, the two tree
structures are not constrained by one another, which is good because
they are subject to entirely different sets of environmental
constraints.  The two structures (physical and logical) must be able
to be represented and handled in different ways simultaneously, from a
single representation in SGML and/or XML.  And external parsed
entities are a key feature for maintaining that independence.

Some people would assert that this capability is no longer needed, and
that there's no longer any such thing as a document that's large
enough to require that its source be distributed among multiple
containers (entities).  These people would assert that, given modern
computing resources, if a document is so large that it still needs to
be distributed among multiple containers, it is too damn big and
should be split up.  I suspect that they may be right, simply because
I know of no counter-example wherein a grotesquely large document is
also optimally architected.  If there's no such thing (any more) as a
non-evil document that is so large that it needs to be stored in more
than one parsed entity, then, theoretically at least, Eliot is exactly
right: external parsed entities are evil.

However, reality intrudes.  Most information is architected
suboptimally.  I think it would be a mistake to discard external
parsed entities.  I think XML needs to have features that let people
do silly things (like evilly putting all the documentation of a
weapons system into a single 190,000-page document) without causing
every XML application to blow a gasket.  We need the means to permit
evil to exist in order to avoid creating a worse evil: the perception
that XML is insufficiently robust to handle real situations, no matter
how evil they are.

The trick is for people to realize that the use of parsed external
entities should be avoided unless their use is purely for storage
convenience.  There should never be a semantic load on the fact that
some data is stored in a particular entity.  Regrettably, most of the
times when parsed external entities have been used, they have been
used for impure, semantically loaded purposes (such as the semantic of
re-usability).  So maybe as a matter of good public education policy,
Eliot's position that "external parsed entities are evil" is the right
approach, as long as we're not going to get rid of XML's ability to
support them.


Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 517 7954  <<-- new phone number
fax    +1 972 517 4571  <<-- new fax number
pager (150 characters max): srn-page@techno.com

Suite 211               <<-- new address
7101 Chase Oaks Boulevard 
Plano, Texas 75025 USA

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS