OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   How to keep "useless" information with SAX (2?).

[ Lists Home | Date Index | Thread Index ]
  • From: Paul Tchistopolskii <paul@qub.com>
  • To: xml-dev@ic.ac.uk
  • Date: Mon, 22 Nov 1999 22:10:52 -0800


Hello.

I'm playing with projectX ( SAX 1.0 based) code, 
trying to force SAX API to read the XML document 
and then to save it *unchanged*.

I was having some problems with 
'useless whitespace' outside the elements.  

I mean the situation, when I have:

<!DOCTYPE content SYSTEM "content.dtd">
<!-- some comment -->
<root>
</root>

With Sun's extensions to SAX API I could get 
the content of <!DOCTYPE and the content of comment.

Unfortunately, whitespace ( including the 
newline) between those constructions has been 
lost.

What I did to workaround this problem was 
patching the code in one place:

/com/sun/xml/parser/Parser.java

 private boolean maybeWhitespace () throws IOException, SAXException
 {

 if (!(inExternalPE && doLexicalPE))
  return in.ignorableWhitespace (docHandler);  

 // return in.maybeWhitespace ();    -- this was the old code

This allows me to get "ignoreable whitespace" 
everywhere, but not only  inside the elements, 
using ignoreableWhitespace callback. 

However - I feel that I'm making some ugly thing, 
because for some reason ignoreableWhitespace is 
defined only for element's content.

I'm wondering, what was the idea behind tracking 
ignoreableWhitespace only inside the elements ?

What happens in SAX2 ?

It also appears that Sun's parser ( I think that 
Sun's parser is not the exeption, right?) does not care 
about some 'useless' things, like <?xml header.

I mean that accodring to the code there is simply 
to way to capture the content of the <?xml header, even 
it may have some *very*  interesting ( sometimes 
critical ) information, like encoding. 

Because I already saw poor people, providing 
"windows-something", and because of some 
Java-specific issues, the current point of view 
on <?xml  header  (something not useful enough 
to keep) looks strange.

What happens in SAX2 ?

Rgds.Paul.




xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS