OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   SAX: Byte Stream Needed?

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <ak117@freenet.carleton.ca>
  • To: xml-dev Mailing List <xml-dev@ic.ac.uk>
  • Date: Wed, 15 Apr 1998 07:28:12 -0400

James Clark writes:


 > InputStreamReader, however, leaves something to be desired because
 > it doesn't allow users to supply their own character-to-byte
 > conversion routines. But if you have an InputStream you should be
 > using the interface to the parser that takes an InputStream.  In
 > any case it's not practical to use an InputStreamReader for XML
 > because that won't deal with XML's rules for detecting encodings.

I have actually been toying with omitting the byte-stream parse()
method altogether, so that there would be only two parse methods:

  public abstract void parse (String publicId, String systemId)
    throws java.lang.Exception;

  public abstract void parse (String publicId, String systemId,
                              SAXCharacterStream input)
    throws java.lang.Exception;

I've defined SAXCharacterStream as follows:

  public interface SAXCharacterStream {
    public abstract int read () 
      throws SAXException;
    public abstract int read (char ch[], int start, int count) 
      throws SAXException;

(Where SAXException is, in the Java version, a direct and unmodified
subclass of java.io.IOException).  The result of either method is -1
if there are no characters left to read; otherwise, it is a UTF-16
character value for the first, and the number of characters read for
the second.

The advantage of using SAXCharacterStream is that behaviour over CORBA
(or, I suppose, DCOM) is now well-defined.  The disadvantage is
another bloody interface.

I had also written a SAXByteStream, but then I started wondering why
we really need it -- information coming from a database, for example,
or from a buffer should already be in characters, not in raw bytes
(and in Java, at least, it is simply to wrap a Reader around any
InputStream when necessary -- I expect that other languages will have
good internationalisation support soon).

Can anyone put forward a convincing case for having a standard SAX
method parsing from a raw byte stream (remembering that
implementations can always extend the SAXParser interface themselves
for special requirements)?

Thanks, and all the best,


David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS