OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Handling internal general entities with SAX



Firstly I would like to quickly address the point Kurt raises below about
undeclared entities being reported as errors.  Every so often we have  a
little discussion about what it means to be an "error".  In this case, the
XML rec is reasonably clear about the type of error. If the document doesn't
have an external DTD subset or is standalone then an undeclared entity is a
well-formedness error, i.e. a Fatal error.

See section 4.1, Well-formedness constraint: Entity Declared

But, by including an external DTD subset, non-validating processors need not
report an undeclared entity as an error at all.  In fact, most
non-validating processors have an option to prevent reading external
entities (including the external DTD subset) so it would be unreasonable to
report undeclared entities as errors.  SAX even has a callback for this very
situation: ContentHandler::skippedEntity()

<---->

The issue Kurt raises reminds me of some thoughts I had when I first started
looking into XML.  I wondered why parameter entities were so named, and
concluded that it must be the intention that their value can be set by
passing "parameters" to xml processors.  My mark-up experience doesn't
stretch back to SGML, so I don't know if this was the original intention of
parameter entities, but it certainly seems like a reasonable idea to me.

I haven't seen this behaviour in any xml processors but nsgmls comes close
by being able to specify the -i{name} flag to change the replacement value
of a parameter entity from "IGNORE" to "INCLUDE".

If xml processors allowed applications to provide the replacement text for
parameter entities at run time, then Kurt's original problem could be solved
by declaring the "test" entity like so:-

<!ENTITY % param1 "[default]">
<!ENTITY test "%param1;">

As an illustration of the kind if thing I have in mind:-

XMLReader r = ...;
r.setParameterEntity("param1", "replacement text");
r.setParameterEntity("p2", "INCLUDE");
p.parse("myfile.xml");

I know this facility isn't offered by SAX, but I wonder if it is possible
using any of the native APIs?  Does anyone agree that this would be a useful
feature to have?

One work-around that SAX does support right now is the ability to provide
the replacement text for external entities.  So, if the test entity was
declared as external like this:-

<!ENTITY test SYSTEM "get_the_test_entity">

then it would be relatively straight forward to write an EntityResolver that
hooked into this and provided the appropriate replacement text via a
StringReader.


Regards
~Rob

--
Rob Lugt
ElCel Technology
http://www.elcel.com/


----- Original Message -----
From: "Devlin, Kurt" <Kurt.Devlin@westgroup.com>
To: "'David Brownell'" <david-b@pacbell.net>; <xml-dev@lists.xml.org>
Sent: 22 October 2001 19:45
Subject: RE: [xml-dev] Handling internal general entities with SAX


> Thanks Dave.
>
> Yes, I realize that I want to "break" the XML rules, but I feel like my
> intentions are good.
>
> We definitely fall into the "no DTD" group for our data exchange. I had
> considered chaining an InputStream in before the Reader to "import" the
> entity declarations. This handles the case for all of the known entities,
> but not for unknown ones.
>
> I still would like to consider an unknown entity as a warning and not an
> error. I know this is contrary to the XML spec, but by the assumption of
how
> we are using entities, it would just be missing CDATA and would not break
> the parse. In this case we would want to log the missing entity and either
> suppress the entity or provide some default replacement text. This would
> allow us to accept more flavors of data where a couple of unknown entities
> wouldn't stop us from parsing and continuing our processing.
>
> I've started looking into implementing my own XMLEntityHandler or
> DefaultEntityHandler as used in Xerces, but there aren't a lot of
specifics
> that I've found on the interaction of all the different pieces. I don't
even
> know if implementing these classes will address my problem.
>
> --Kurt
>
> -----Original Message-----
> From: David Brownell [mailto:david-b@pacbell.net]
> Sent: Monday, October 22, 2001 12:30 PM
> To: Devlin, Kurt; xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Handling internal general entities with SAX
>
>
> > In SAX, is there a way to handle internal general entities without
> declaring
> > them? I would like to be able to regognize &test; without having to
> > explicitly define it with <!ENTITY test "[this is a test]">.
>
> That is, you want an "XML" parser not to enforce basic well
> formedness requirements?  That'd violate all kinds of fundamental
> rules for XML processing.  That's not something SAX, or any
> other XML API, encourages.
>
>
> > The reason for this is that we are taking our XML to several different
> > output formats and each will want to handle some entities differently.
>
> The normal way to do that involves each output stream having
> different entity declarations.  That means each must have a different
> DTD, either with different external subsets or with conditional sections
> or (most simply) like
>
>     <!DOCTYPE my-app-rootnode
>         SYSTEM http://www.example.com/dtds/my-app.dtd
>     [
>     <!ENTITY test "[this is a test]">
>     ]>
>
> Alternatively, some folk have adopted "no DTD" policies for
> the data they interchange, and then paste their own DTDs
> (with entity declarations) in front of files.  It's easy enough to
> splice one Reader (or InputStream) in front of another, using
> an InputStream.
>
> - Dave
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
>