[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX InputSource and character streams
- From: Mike Brown <firstname.lastname@example.org>
- To: David Megginson <email@example.com>
- Date: Mon, 12 Mar 2001 09:41:33 -0700 (MST)
David Megginson wrote (quoting my question of Feb 21):
> > My question was, when supplying a character stream to the parser, is it
> > reasonable to expect that the parser will not complain if the encoding
> > declaration says the encoding is (was) something the parser does not
> > support?
> > XML seems to assume that every parsed entity that a processor encounters
> > consists of encoded characters (bytes, essentially), whereas in practice
> > we obviously have parsers that accept the entities as characters.
> Hmm -- I can see two reasonable arguments here:
> 1. With a Java character stream, there's no way to know what the
> original encoding might have been, so the encoding declaration is
> 2. A Java character stream is presented (more-or-less) in UTF-16, so
> the encoding declaration, if present, should agree with that.
With all due respect, I was hoping that you would offer some
reconciliation, rather than restating the problem.
The dilemma is that the XML spec insists that if an encoding declaration is
present, that it is an error for the document to not be in that encoding.
Obviously in case (1) you are not dealing with encoded characters, so one
would expect that the encoding declaration is moot, and therefore we just
ignore the XML spec's lack of consideration for unencoded entities.
I am suggesting that is would put the issue to rest if the SAX specs would
just come out and say "if an InputSource is constructed from a character
stream, the encoding declaration, if present, must be ignored because it is
irrelevant, in spite of the XML 1.0 Recommendation's requirement that the
document bear the same encoding as is declared."
Mike J. Brown, software engineer at My XML/XSL resources:
webb.net in Denver, Colorado, USA http://skew.org/xml/