OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: SAX: Whitespace Handling (question 5 of 10)

[ Lists Home | Date Index | Thread Index ]
  • From: Peter Murray-Rust <peter@ursus.demon.co.uk>
  • To: Michael Kay <M.H.Kay@eng.icl.co.uk>, xml-dev@ic.ac.uk
  • Date: Wed, 07 Jan 1998 01:01:21

At 14:16 05/01/98 -0000, Michael Kay wrote:
>>BTW: IMHO, IFF there is going to be a "default implementation" anyway, I
>>would actually prefer an "ignorableWhitespace" method which calls charData
>>by default. This will permit cleaner implementations.
>
>
>I may be simple-minded, but surely the default action with ignorable white
>space should be to ignore it?

Not simple-minded :-)

The whitespace issue is not trivial, but is (I think) consistent. The
*parser* has no option except to pass all characters that are not markup to
the application. This means that in:
<FOO>
  <BAR/>
</FOO>

A parser MUST pass the equivalent of

<FOO>\n\s\s<BAR></BAR>\n</FOO>

to the application.  

In a well-formed document there is NO indication of which character data
are/are_not significant ("ignorable") so by default the application will
have a tree structure where FOO has 3 children.

FOO
  "\n\s\s"
  BAR
  "\n"

If the application is told through
stylesheets/PIs/hardcoded_semantics/telepathy/a_human that all whitespace
is ignorable, fine - but it is NOT part of the XML spec.

If the DTD reads:

<!ELEMENT FOO (BAR)>

the "validating parser" (and we are still struggling with exactly what one
of those is :-) MUST tell the application:

"Hey! Be  careful! I've sent you a FOO, but it has element-only content, so
you may wish to ignore all the whitespace-only children of the FOO". The
application should say thank you, and then do whatever it feels like doing
with this information.

HOW the parser tells the application is what we are tackling.  DavidM has
suggested that when the "ignorable whitespace" is emitted from the parser,
it generates a special event. This seems reasonable - I suppose there could
be other methods (even simply announcing which elements had element-only
content should be sufficient).

[Please shoot this down if I've got it wrong :-)].

	P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS