OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX/Java Proposed Changes

[ Lists Home | Date Index | Thread Index ]

----- Original Message ----- 
From: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
To: <sax-devel@lists.sourceforge.net>
Cc: <xml-dev@lists.xml.org>
Sent: Monday, March 08, 2004 10:07 AM

> >>  Being able to rely on
> >>  startDocument()/endDocument() in the ContentHandler allows all the
> >>  initialization  and tear-down code to easily go in the same class as
> >>  the code that fills the data structure. It's all neatly unified.
> >
> >Why could it not go in the same class in the other case?
> If the ContentHandler doesn't have any initialization or cleanup 
> methods (or at least any reliably invoked ones) then it can't do the 
> initialization or cleanup. Something else has to do it. you could add 
> cusotm initialization or clean up methods and then have the something 
> else call these:
> handler.initialize()
> parser.parse();
> parser.cleanup();

Well, obviously, any initialization or cleanup methods would have to
be exposed, that was the point about adding a parsingDone() method.
I guess there was a misunderstanding.
> But that's still ugly and less than ideal. As I teach my students, if 
> certain public methods must be invoked in a certain order, then 
> something is wrong. They should be made private and combined into one 
> public method. 

As always, it depends.

> Each public method call should be atomic and 
> independent of other public methods. 

If I use a Stream, then I should close it at the end.
Does that mean the Stream class has a bad API?

> Having the ContentHandler do its 
> own initialization and cleanup makes the code clean and robust. 
> Relying on others to do it makes the code ugly and brittle. 

The contentHandler would still do its own cleanup.
Just not throught endDocument(), which should serve a different purpose.

> Oh, it just hit me why startDocument() is not an adequate replacement 
> for endDocument(). There's often work you want to do at the end of a 
> parse irrespective of whether there's a next document or not. 

That was the point in my reply to Dennis.

> For 
> instance, you might want to store the results in a database 
> somewhere, or update some other variable. The purpose of 
> endDocument() is not solely to clean up any data structures that were 
> used. We need both startDocument() and endDocument(), not just one. 
> Yes, they may not be named precisely correctly, but we do need them, 
> and not being able to rely on them is a major hassle.

Actually, I was thinking more along the lines of adding parsingDone()
to the ContentHandler API, in addition to having endDocument().
One would think that from the same line of argumentation an "initialize()"
method would be necessary as well, but this can be done based on state, i.e.
at first recognizable start of a new parsing process.

One could actually leave out endDocument, as parsingDone() and inspection
of state together would indicate if the end of the document was reached.
In other words, endDocument()/parsingDone() would focus entirely on finalization,
which means, endDocument() really is not the counterpart to startDocument(),
but rather to parse() and should be called "parsingDone()", and the "real" initialize()
and endDocument() methods would be internal, derived from state inspection.

From that point of view, ignoring any name changes, the docs should be
changed to describe the purpose of endDocument() and make it a required
call once startDocument() has been called. So - in a roundabout way -
I arrive at the same conclusion. ;-)



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS