[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [xml-dev] Handling internal general entities with SAX
- From: "Devlin, Kurt" <Kurt.Devlin@westgroup.com>
- To: 'David Brownell' <firstname.lastname@example.org>, email@example.com
- Date: Mon, 22 Oct 2001 13:45:38 -0500
Yes, I realize that I want to "break" the XML rules, but I feel like my
intentions are good.
We definitely fall into the "no DTD" group for our data exchange. I had
considered chaining an InputStream in before the Reader to "import" the
entity declarations. This handles the case for all of the known entities,
but not for unknown ones.
I still would like to consider an unknown entity as a warning and not an
error. I know this is contrary to the XML spec, but by the assumption of how
we are using entities, it would just be missing CDATA and would not break
the parse. In this case we would want to log the missing entity and either
suppress the entity or provide some default replacement text. This would
allow us to accept more flavors of data where a couple of unknown entities
wouldn't stop us from parsing and continuing our processing.
I've started looking into implementing my own XMLEntityHandler or
DefaultEntityHandler as used in Xerces, but there aren't a lot of specifics
that I've found on the interaction of all the different pieces. I don't even
know if implementing these classes will address my problem.
From: David Brownell [mailto:firstname.lastname@example.org]
Sent: Monday, October 22, 2001 12:30 PM
To: Devlin, Kurt; email@example.com
Subject: Re: [xml-dev] Handling internal general entities with SAX
> In SAX, is there a way to handle internal general entities without
> them? I would like to be able to regognize &test; without having to
> explicitly define it with <!ENTITY test "[this is a test]">.
That is, you want an "XML" parser not to enforce basic well
formedness requirements? That'd violate all kinds of fundamental
rules for XML processing. That's not something SAX, or any
other XML API, encourages.
> The reason for this is that we are taking our XML to several different
> output formats and each will want to handle some entities differently.
The normal way to do that involves each output stream having
different entity declarations. That means each must have a different
DTD, either with different external subsets or with conditional sections
or (most simply) like
<!ENTITY test "[this is a test]">
Alternatively, some folk have adopted "no DTD" policies for
the data they interchange, and then paste their own DTDs
(with entity declarations) in front of files. It's easy enough to
splice one Reader (or InputStream) in front of another, using