OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: SAX2 RFD: LexicalHandler draft v.1.1

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <db@eng.sun.com>
  • To: David Megginson <david@megginson.com>
  • Date: Fri, 02 Apr 1999 11:04:24 -0800

I'd have responded sooner, but this discussion started on the
day I left for some vacation ... :-)

Note that some of this feedback comes from having implemented
versions of this functionality and from user feedback on it.
(Based on earlier discussions, some on xml-dev.)

That's in the latest parser from Sun (TR1); some folk might
care to play with that code a bit.  (There's also a version
of DTDHandler extensions too -- essential! :-)

Short summary:  the basic idea is still right, though I think
the DTD related stuff should be done a bit differently.

- Dave


David Megginson wrote:
> 
> // LexicalHandler.java
> // $Id: LexicalHandler.java,v 1.1 1999/03/21 02:49:41 david Exp $
> // SAX2 handlerID: http://xml.org/sax/handlers/lexical
> 
> package org.xml.sax;
> 
> public interface LexicalHandler
> {
>     public abstract void xmlDecl (String version,
>                                   String encoding,
>                                   String standalone)
>         throws SAXException;

I'd far prefer to drop XML declarations; if they're to
be provided, I'd rather see a general text declaration
facility (version and encoding) applying to all parsed
entities.

Then, standalone would look like the special case it is;
perhaps with a callback just for that boolean value, when
it's even provided.  (Standalone is trivalue:  yes, no,
and unspecified.)


>     public abstract void startDTD (String doctype,
>                                    String publicID,
>                                    String systemID)
>         throws SAXException;
> 
>     public abstract void endDTD ()
>         throws SAXException;

These IMHO belong in the DTDHandler2 interface !  

Also, we've found it essential to see the internal subset;
it's most practical to report it as a single string.  If
one can't see that subset, one can't plan to round-trip
the data in a document, and the ability to do that sort of
round-trip is critically important.  (Even though some folk
want more data to pass through than others -- e.g. many
don't care about CDATA boundaries, comments, etc.)

In fact, what Sun did for this functionality was to
partition it into three things (in DTD callbacks):

	startDtd (String rootName)
	endDtd ()
	    ... "start" has the declared root name
	externalDtdDecl (String publicID, String systemID)
	    ... just for the unnamed [dtd] PE
	internalDtdDecl (String internalSubset)
	    ... the literal internal subset

This permits "safe" and complete recreation of the doctype
declaration.


>     public abstract void startEntity (String name)
>         throws SAXException;
> 
>     public abstract void endEntity (String name)
>         throws SAXException;

Right ... except that we pass a boolean "included" flag with
the startEntity() call to meet the XML 1.0 specification 
requirement to report entities that aren't included (e.g. a
nonvalidating parser of some types).  To "pass through" one
needs to be able to reproduce all entity refs, and the flag
is needed to distinguish entities with no content from ones
which just weren't read. 

As I noted earlier, and James did more recently, this can't
apply to entities in attribute values.  It needs to be
specified/documented accordingly -- these callbacks must only
apply to content.  (I'll look at the proposal for attribute
handling later.)

There was also the issue of whether this is a general or a
parameter entity ... we took the position that for sanity,
we'd only present _general_ entities this way.  For example,
PEs inside markup declarations would be pretty useless.

PE/DTD parsing can be a separate ("SAX3"? :-) set of features,
and with any luck the popular tools will develop using XML-syntax
schemas rather than PEs and that "SAX3" module won't ever need
to happen; it'd need to be messy.


>     public abstract void comment (String text)
>         throws SAXException;
>
>     public abstract void startCDATA ()
>         throws SAXException;
> 
>     public abstract void endCDATA ()
>         throws SAXException;

Right, all this is basically needed in that form.

> }
> 
> // end of LexicalHandler.java

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS