OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Entity support in XML parsers (was: UTF-8+names)

[ Lists Home | Date Index | Thread Index ]

From: "John Cowan" <cowan@mercury.ccil.org>
> Bob Foster scripsit:
> > XML does not require parsers to support entities apart from validation
> ObMemeStomping:  This is not true.  XML processors, validating or not,
> must support internal entities that are defined in the internal subset.

That's swell if you suppose that users require only internal entities with
simple definitions. External parsed entities are required to be included
only if validating, even if they are defined in the internal subset. So
entities cannot be used as includes, an important use case. Parameter
entities are often used to define the content of general entities, but this
usage is not allowed in the internal subset; in the external subset,
parameter entities need only be included if validating. But of course, the
external subset itself need only be read if validating.

Then, for this point to be relevant to the subject at hand, users would have
to be satisfied with a feature that requires them to include every entity
declaration used in a document in its internal subset. I don't think so.

(If I did think so, I would simply add a feature to the editor that, every
time a user wrote a reference to an entity in a document, automatically
reached out to a user-defined set of DTDs and added the definition to the
internal subset, analogous to the way many editors handle Java import
declarations. Problem solved.)

No comment on the remainder, because it doesn't address the needs of users
who want validation by a non-DTD schema language in combination with entity

Bob Foster

> Something I've long wanted to see is a tool that takes a full DTD
> and squashes it down to the bare minimum required for use as an internal
> subset to preserve all DTD infoset effects but *not* validity.  In
> particular:
> 1) Expand all parameter entities and eliminate all parameter entity
>    declarations
> 2) Eliminate all attribute declarations that are CDATA and either #IMPLIED
>    or #REQUIRED
> 3) Eliminate all element declarations that are ANY, EMPTY, or mixed
>    and simplify all element-content ones to <!ELEMENT foo (foo)> or
>    something of the sort
> A reasonable implementation strategy would be to start with James Clark's
> DTDinst program (http://www.thaiopensource.com/relaxng/dtdinst) and
> then use XSLT (with text output mode) to generate the new DTD.
> Anyone interested in tackling it?
> -- 
> In politics, obedience and support      John Cowan
> are the same thing.  --Hannah Arendt    http://www.ccil.org/~cowan


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS