OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: SAX and whitespace (was Re: Problems with whitespace and msxml)

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <ak117@freenet.carleton.ca>
  • To: xml-dev@ic.ac.uk
  • Date: Fri, 2 Jan 1998 21:43:27 -0500

Tim Bray writes:

 > Lark, BTW, does *not* catch ignorable white space unless it is
 > validating.  Since it is perfectly OK to build SAX with such a
 > processor, *if* we want to build ignorable white-space notification
 > into SAX, it has to be out-of-band; i.e. white space is passed in
 > the same way as all other content; with perhaps another boolean argument
 > to the text() method (that what it's called now?) that if true, means
 > this is ignorable white space.

Thank you for the reply, Tim.  I would like to make certain, however,
that I understand the behaviour that you're recommending.  If a
DTD-driven parser finds ignorable whitespace, and if we decide that
SAX should not provide ignorable whitespace notification, then which
of the following is the correct action?

1) the parser should not report the whitespace; or

2) the parser should report the whitespace as regular character data.

>From my reading of the PR, and from my understanding of your comments,
you are recommending (2); in other words, given the following document:

  <!DOCTYPE foo [
   <!ELEMENT foo (bar+)>
   <!ELEMENT bar (#PCDATA)>
  ]>
  <foo>
  <bar>one bar</bar>
  <bar>two bars</bar>
  </foo>

A DTD-driven parser would report something like the following events
through SAX:

  - start document
  - start element: "foo"
  - character data: "\n"
  - start element: "bar"
  - character data: "one bar"
  - end element: "bar"
  - character data: "\n"
  - start element: "bar"
  - character data: "two bars"
  - end element: "bar"
  - character data: "\n"
  - end element: "foo"
  - end document

In full SGML, you'd get something a little simpler, because the
whitespace in element content would be discarded:

  - start document
  - start element: "foo"
  - start element: "bar"
  - character data: "one bar"
  - end element: "bar"
  - start element: "bar"
  - character data: "two bars"
  - end element: "bar"
  - end element: "foo"
  - end document

 > But I would oppose doing this in SAX; let's keep it simple for now. -T.

Sounds reasonable.


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS