OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: SAX: Parser Interface -- Summary of Change Requests

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: David Megginson <ak117@freenet.carleton.ca>
  • Date: Sun, 01 Feb 1998 16:49:52 -0500

> Here are the change requests in detail, with my initial response at
> the end of each one:
> 1) Allow SAX to work with an input stream as well as a URI.
>    - Paul Pazandak <pazandak@OBJS.com>
>    - Peter Murray-Rust <peter@ursus.demon.co.uk>
>    - Don Park <donpark@quake.net>
>    Currently, the Parser interface provides only the following method
>    to initiate a parse:
>      void parse (String publicId, String systemId)
>        throws java.lang.Exception;
>    Following this suggestion, there would be a new method
>      void parse (String publicId, String systemId, InputStream input)
>        throws java.lang.Exception;
>    (It is still necessary to provide a system identifier for resolving
>    relative URIs within the stream).  Note that the stream would be a
>    byte stream, not a character stream -- characters might require
>    more than one octet, depending on the encoding in use.

Well, what if the XML data is streamed from a database where a URL does not
matter so much.  If you look at what Oracle, Sybase, and Microsoft among others
are planning on doing with XML, then supporting this with SAX in the most
ubiquitous way will be very much necessary.  I think that if you want to make SAX
have any CORBA support or other language support down the line, it would be best
to negate any polymorphism in the API cause in CORBA for example, you cannot
redefine operations in IDL (methods in Java).

>    I can see the convenience of this method, and I plan to add
>    something like this to AElfred when I have a chance.  For SAX,
>    however -- which is meant to end up as a language- and
>    system-independent API -- I am reluctant to hardcode assumptions
>    about storage (and I don't know enough about IDL to know if there
>    is a general representation for streams).  Paul Pazandak has also
>    suggested allowing strings and buffers -- in this case, they would
>    already be decoded into characters.

Another idea (as far as implementation goes) is to have the parser simply be an
extension of java.io.FilterInputStream which takes an one or more Handler
interfaces as arguments (to delegate to), so that you can handle very large
streams of data.  In addition to overriding the necessary
java.io.FilterInputStream methods, you can also have methods like readDocument(),
readElement(), etc.  This would give people a lot more control over reading in
XML.  This approach of course is similiar to how URL Content in the java.net
package handles content.  But where I see this approach being most useful is in
transactions where you might only want to read in a limited amount of data
anyways and process only that or else in the case where XML content is always at
a fixed length (like in databases where you get null padding for string fields
which do not take up the assigned length).  With the current SAX implementation,
you have no real control at the IO level where it would help to skip content if
the application feels it is necessary.

>    Personally, I'm undecided, and would be interested in hearing the
>    theoretical arguments for and against this suggestion.
> 2) Simplify handler chaining by adding get* methods for existing
>    handlers.
>    - Don Park <donpark@quake.net>
>    Currently the Parser interface provides only setters for the
>    various handlers:
>      public void setEntityHandler (EntityHandler handler);
>      public void setDocumentHandler (DocumentHandler handler);
>      public void setErrorHandler (ErrorHandler handler);
>    Following this suggestions, there would also be accessors:
>      public EntityHandler getEntityHandler ();
>      public DocumentHandler getDocumentHandler ();
>      public ErrorHandler getErrorHandler ();
>    An application could then retrieve the existing handler and
>    implement a new one which invokes the old one under certain
>    circumstances.

Not sure exactly what the use of these get methods is for cause all the handlers
are useful is delegation anyways.  The only reason the get methods would be
useful is for casting the returned object to some other form.  Why anyone would
need to do this is beyond me as recasting this object back to something would be
sloppy implementation in the first place.

>    This seems like a generally good idea (as will as a simple and
>    backwards-compatible change), and I am willing to implement it.
>    The only complication is that we'll have to define the default
>    state -- is the parser always required to return a default handler
>    if the user has not explicitly set one, or should it return null?

The default handler could just be something which spits stuff out to stdout or
some other OutputStream in a manner similiar to how Aelfred's EventDemo does.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS