OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Use DTDs!

I have said on many occasions that general entities are evil and I stand
by that. I still regret that we did not eliminate general entities from
XML (although we effectively did by not requiring the use of DTDs or
DOCTYPE declarations). I'm pretty sure I was one of the people who argued
strongly to preserve them--if so I regret that decision now. I was still
suffering from SGML brain damage then.

I'm not aware of *any* XML-aware content management systems in general use
today that preserve general entities. Systems based on XML databases like
MarkLogic and eXist certainly do not. And I would say that that is good.
Because general entities are evil.

The reason general entities are evil is that they give the *appearance* of
reuse while in fact not providing reuse. Because they are resolved by the
parser there is no opportunity to deal with things like IDs and references
to ensure that the post-parse result is sensible. For example, if you have
an entity that includes an element with an @id attribute of type ID and
you include it twice in the document the document cannot be valid. Game
over, man. Game over. Xinclude has the same problem and is thus no better
than general entities.

Internal text entities are only slightly less evil but they are still
problematic. Fortunately, we rarely see them any more.

I will observe that the DITA standard explicitly avoids the use of general
entities, both as a matter of practice and as a consequence of not
requiring the use of *any* grammar in conforming DITA documents. All
modularity and re-use in DITA is done at the application level through
DITA-defined hyperlinks, with no reliance on any parse-level features.

DITA's DTD grammars do, of course, use parameter entities but that's
unavoidable. And as of DITA 1.3 the DTDs are completely generated from
RELAX NG, where the reuse model is much more sensible while being
semantically and syntactically similar to DTDs.

If the inability to support general entities killed DynaBase that's
unfortunate--in that regard DynaBase was ahead of its time and the
community at the time (probably myself included) did not understand that
trying to use general entities for reuse was simply the wrong thing to do.

SGML reflected a time (the late '70s and early '80s) where lexical
processing was about all computers could handle. It was only in the later
'80s and early '90s that we started to see more clearly the distinction
between the lexical processing and the abstraction that the SGML documents
were simply serializations of. That conflation of the syntax and the
abstraction was a problem in SGML and continued into XML. We're still
struggling with the distinction today.

One of the goals of HyTime's grove mechanism was to provide a sufficiently
complete and flexible abstraction such that you could provide different,
but correct, abstract views of the same SGML document so that different
tools could get what they need within the context of a common view of the
data. As Steve says, it was a mechanism too heavy for the emerging
distributed computing environment of the Web. But we still see this
problem today in the various ways of expressing the abstraction of XML
documents: XDM, InfoSet, JSON serializations of XML, the DOM, etc. The
requirement groves were trying to address is real, it's just that trying
to solve it with a monolithic solution was simply a non-starter.



Eliot Kimber, Owner
Contrext, LLC

On 5/2/16, 9:09 PM, "Christopher R. Maden" <crism@maden.org> wrote:

>On 05/02/2016 08:56 PM, Rick Jelliffe wrote:
>> I think one of the limitations of the idea of grammar neutrality (ie
>> freely translate the schema into the particular grammar available for
>> each tool) is this lack of entity maintenance by some converters.
>> (Not a good term) Possibly it is a bigger problem than the different
>> power of the different grammars.
>This is what killed EBT.
>This is second-hand hearsay, through a filter of 20 years, but the
>DynaBase project had multiple big-ticket pre-orders before someone
>noticed that, in parsing the SGML, it normalized away all the entity
>references, which made it useless for actual ongoing document
>management, which was its primary selling point.
>Fixing that would have involved changes going down to the deepest layers
>of the parser... which was just no feasible on the announced schedule.
>So instead of an IPO making me rich, EBT was sold to Inso (which made a
>few people rich, entirely deservedly), who proceeded to run everything
>into the ground.
><paul-harvey>Now you know... the rest of the story.</paul-harvey>
>Chris Maden, text nerd  <URL: http://crism.maden.org/ >
>³If you¹ve been a man o¹ action, though you¹re lying there in traction,
>  You will gain some satisfaction thinkin¹, ŒJesus, at least I tried.¹²
>   ‹ Andy M. Stewart (1952­2015), ³Ramblin¹ Rover²
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS