OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] entity definitions

[ Lists Home | Date Index | Thread Index ]

Dear David,

On Mon, 26 Apr 2004 23:42:13 +0100
David Carlisle <davidc@nag.co.uk> wrote:
> I wish that were true (would make MathML fragments a lot easier) but
> I have not seen any XML parser that would support that reading and as I

May I suggest that that problem is solvable?

> say a request that that be the case in XML 1.1 was explictly rejected,
> so I'd assume that the XML core group would not agree with you that that
> is a possible reading for 1.0 (or 1.1)

Phooey on them, then.  Time for a different tack, I think.

> More than that, not only did they decline to fix the problem, they
> asserted that there was no problem to fix (using language I found rather
> offensive, but thats not the issue here)
> http://www.w3.org/XML/Core/2002/10/charents-20021023

Right.  That's pretty cold.  Not to mention stupid.

Problem: schema languages other than DTD are increasingly popular, for
very good reasons, offering facilities not available or difficult of
attainment in DTDs.  However, without a DTD (if only an internal subset),
entities may not be used.

Secondary Problem: the XML Core WG has already taken the position that no
other mechanism for entity declaration is needed (at least for character
entities).  No help for an alternate entity declaration mechanism can be
expected from this direction.

Context: at least two major document types that I know of (XHTML 2.0 and
DocBook next) are using or plan to use a non-DTD schema language for their
primary definitions (is MathML next also potentially in this group?). 
These particular dialects are likely to have particular issues with the
lack of support for entity declaration outside of DTDs, as the "solution"
proposed by the Core WG appears to require either that the dialects define
canonical DTDs, or force users who wish to make use of entities to include
significant blocks of external entity declarations in an internal subset. 
As these dialects have traditionally made very extensive use of large
entity collections, and the users of the specifications are likely to be
unwilling to transition to a new version that lacks this support or in
which the support is lessened, the groups or individuals developing these
specifications potentially have a vested interest in the creation of an
alternate mechanism that does not rely on (but does not conflict with)
the existing DTD mechanism.

1) develop a mechanism that can be shown to provide the functionality
needed by these groups
2) develop a justification for this mechanism
3) enlist the groups and individuals with an interest in the mechanism to
form a group that can express its needs and can show a body of users as
4) approach developers and maintainers of parsers and validators with the
feature request to support the new mechanism


1) I believe that the EDML mechanisms would be effective.  They integrate
*very* painlessly into schema languages, as I think the XHTML 2.0 example
in the spec shows.  I'm not wedded to it.  (It's just my ego at stake,
that's all!  No problem!  *laugh*)  In any event, it might serve as a
starting point for discussion of an appropriate mechanism.

2) See section 2.9 of XML 1.0, VC: Standalone Document Declaration.  A
valid document MUST have standalone="no" if "external markup declarations"
contain "entities (other than amp, lt, gt, apos, quot), if references to
those entities appear in the document".  See section 4.1 of XML 1.0, VC:
Entity Declared.  If standalone="no", then the lack of a declaration is a
validation error, not a well-formedness error.  In WFC: Entity Declared,
first sentence, first clause, understand that "DTD" at the time of
definition was a complete synonym for "schema", and interpret the clause
as "in a document without any schema," or alternatively, "if not
validating or if validation fails because no schema is available".

Note: I am not arguing that any current processors interpret the
specification in this way.  I am arguing that the specification is
amenable to this interpretation, and that it would then be possible to
approach parser developers and maintainers, and ask for an option or
property setting to be added which permits it.

3) *sigh*  I'm prolly not very good at that part.

4) if the justification presented in 2 is accepted, or accepted as a
viable interpretation worthy of support (more likely if 3 is successful
and important schema specifiers express an interest), then the following
features/properties could be requested of parser developer maintainers:
   a) VC Entity Declared (not WFC) if standalone="no" (simplest case)
   b) VC Entity Declared (not WFC) if validating (the apparent intent of
the original spec, despite the Core WG's current position)
   c) entity declaration via EDML or an equivalent (non-DTD, probably
XML-based) mechanism easily included in schemas (needs b, although it
could work with a; users wouldn't like that)
   d) support for processing-instruction style entity declaration (needs
a, since the document might be merely well-formed otherwise)

I realize that the problem of acceptance of an alternative entity
mechanism is an important topic, but I'm not confident that EDML is the
correct solution technically, and would prefer to explore that, then
recruit supporters to get implementations.

If commentary 2 above is accepted as a reasonable interpretation, then a
mechanism meeting agenda item 1 can be used as grounds to generate a group
(item 3) to lobby for item 4 (or perhaps some variant of the
features/properties there).  This need not conflict with the ability to
support a parser that conforms to the Core WG interpretation; the
features/properties may default to Core WG interpretation and provide "XML
+ Namespaces + Entities" support with the appropriate properties/features

EDML provides these benefits, which I think are valuable:
1) WFC: Parsed Entity is impossible to violate (unparsed entities cannot
be declared)
2) WFC: No Recursion is impossible to violate by the design of the
specification (an entity is not declared until its closing tag is
encountered, and if it contains an entity reference that is undefined,
that entity has no replacement content).
3) Sections 4.2.1, 4.3.2, and 4.5 are simplified (no parameter entities)
and automatically fulfilled (no need to check for unbalanced tags or
pointy brackets) by standard XML parsing.[*]
4) Very easy integration into XML-based schema languages (W3C XML Schema
and Relax NG) without the need to modify those languages (it's a mix-in).

* If using external entities defined as fragments and named on import,
this is marginally less true; entities wrapped in <entity></entity> are
obviously necessarily well-formed.

And it does not preclude the use of DTDs for those who prefer them.  It
simply provides an alternative mechanism, one which can be used in
combination with DTD entity declaration.  If the Core WG, as seems likely
from the cited Note, is uninterested, then it behooves those interested to
follow an alternate path (or be stuck using DTDs to define entities).

Amelia A. Lewis                    amyzing {at} talsever.com
I don't know that I ever wanted greatness, on its own.  It seems rather
like wanting to be an engineer, rather than wanting to design something,
or wanting to be a writer, rather than wanting to write.  It should be a
by-product, not a thing in itself.  Otherwise, it's just an ego trip.
              -- Merlin, son of Corwin, Prince of Chaos (Roger Zelazny)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS