OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SAX InputSource and character streams

On Monday, February 19, 2001 5:26 PM, Mike Brown wrote:-
> [snip]
> When constructing a SAX InputSource from a character stream
> (java.io.Reader), is it correct to assume that any encoding
> declaration given in the document will be ignored?

Mike, I don't think you need to worry about the encoding in this case.
Obviously it is an implementational thing, but I would argue that it makes
no sense for a SAX parser to try to validate the encoding string contained
within a character stream (java.io.Reader).

As you obviously know, the encoding declaration is used to allow the parser
to convert from a Byte stream (java.io.InputStream) into a character stream
(java.io.Reader).  As you are providing a character stream, the parser does
not need to worry about the encoding.  In fact, the parser couldn't do
anything with the encoding even if you wanted it to - because it would need
Bytes to operate on - and you aren't giving it any!

All XML parsers have to do an awkward little dance when reading an external
entity: checking for a Byte Order Mark and xml declaration in order to
determine which encoding the entity uses.  The external entity (byte stream)
is then wrapped by a character stream (java.io.Reader) that understands the
encoding of the byte stream, and the character stream is used from then on.
If the java.io.Reader points back to the start of the external entity (as is
likely), the parser will then get to see the encoding string (in character
form), but it is purely informational at this point and the parser is
unlikely to do anything with it.

Hope this helps
Rob Lugt
ElCel Technology