OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   SAX: Whitespace Handling (question 5 of 10)

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <ak117@freenet.carleton.ca>
  • To: xml-dev Mailing List <xml-dev@ic.ac.uk>
  • Date: Sat, 3 Jan 1998 13:02:33 -0500

[SAX is a proposal for a simple, event-based XML API, using
callbacks.  This is one in a series of ten design questions that we
need to answer to implement the API.]

Should SAX allow DTD-driven parsers to distinguish ignorable
whitespace from other character data?

  public void ignorableWhitespace (char ch[], int length);

(We have already had some discussion on this topic.)


CON
---

- this method would make SAX slightly larger;

- parsers that use the DTD will return different results than parsers
  that do not (though it would be trivial to map the two);

- the concept of ignorable whitespace can be confusing for
  non-specialists.


PRO
---

- the PR requires "validating" parsers to flag ignorable whitespace
  for the application;

- there would be no need to implement anything here for most
  applications;

- whitespace in element content is almost never significant for
  formatting or database applications (if it were significant, then
  the element type would have mixed content).


MY RECOMMENDATION
-----------------

Qualified no.

As someone who has worked with SGML for many years, I would rather not
see the ignorable whitespace at all; however, the PR requires parsers
to report all whitespace.

Tim Bray's recent comments on this list imply that a validating parser
using SAX could report ignorable whitespace as regular character data
and still be conforming; if I have inferred correctly, then I am
willing to omit this callback.


OTHER CONSIDERATIONS
--------------------

It would also be possible to implement this in the charData callback
itself:

  public void charData (char ch[], int length, boolean isIgnorable);

However, given that charData will probably be the most
heavily-implemented handler, and that very few applications will care
about ignorable whitespace, I would prefer not to complicate things
unnecessarily.  If we need to distinguish it to be conforming, then
ignorable whitespace should probably be shuffled off to its own
callback, to make it easier to ignore.


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS