xml-dev - Re: Problems with whitespace and msxml

Re: Problems with whitespace and msxml

[ Lists Home | Date Index | Thread Index ]

From: Peter Murray-Rust <peter@ursus.demon.co.uk>
To: xml-dev@ic.ac.uk
Date: Thu, 01 Jan 1998 15:45:35

Whitespace has been (and I suspect will continue to be) a frequent topic on
XML-DEV :-) It can be a confusing topic and long-term members of XML-DEV
are sympathetic and helpful when it is raised.  

(A). There is no simple one-groks-all solution to the problem. If there
were, we should be using it :-)

(B) a lot of material about whitespace has been written on this list,
including 5 paragraphs from David Durand. You will find references to some
of the discussion on XML-DEV jewels:
(http://ala.vsms.nottingham.ac.uk/vsms/xml/jewels.html)

At 08:43 01/01/98 -0500, David Megginson wrote:
>Alexander Hinds writes:
>
>[on xml:space]
>
> > Moreover, no matter what I set it to, I always get back whitespace
> > in my tree, even without a mixed content model (for example, for
> > element book, it's first sib is always whitespace).
> >  My question, basically is: how do I eliminate whitespace from my
> > tree entirely?  Or failing that how do I get the current value of
    ^^^^^^^^^^^^^
By not including it in your document :-)

> > xml-space in my ElementImpl subclass?  It appears that nameXMLSPACE

I have not managed to get msxml working yet, but assuming that you can
retrieve attributes values, xml:space is a potential attribute for any
element. The rules for its inheritance from root are given in the spec.

> > is private, not protected (why?) so a subclass can't really search
> > it.  But even when I change the visibility, it's always null
> > anyway.
>
>I have not used msxml recently, so I do not know what it does, but the
>PR is very clear that the 'xml:space' attribute is strictly
>informative (from 2.10, "White Space Handling"):
>
>   An XML processor must always pass all characters in a document that
>   are not markup through to the application. A validating XML processor

I find the phrase "validating XML processor" a confusing one because it
refers to a piece of software.  Validation requires:
	- enough information in the document to *allow* it to be validated (e.g.
enough ELEMENT and ATTLISTs to cover all elements found in the document.)
	- a decision that the document *should* be validated. This may come from:
		- the author (implicit in the inclusion of a DTD and some PIs)
		- the client software (e.g. it makes decisions as to when to validate)
		- the human user ("press the validate button").
	- software sufficiently powerful to map the content of an element on to
its contentSpec.

IOW the identification of ignorable whitespace (which is *mandatory* for a
validating parser) depends on an unclear combination of the above.

>   must distinguish white space in element content from other non-markup
    ^^^^
It can only do this if the document allows it to...

>   characters and signal to the application that white space in element
>   content is not significant.
>
>   A special attribute named "xml:space" may be inserted in documents to
>   signal an intention that the element to which this attribute applies
>   requires all white space to be treated as significant by applications.
>
>In other words, the value of xml:space should _not_ affect the
>information that msxml returns to your application; instead, it is up
>to your application to read the value, if present, and to take
>appropriate action.  Msxml should return all whitespace, no matter
>what.

And - assuming it calls itself a validating parser - *must* identify which
of that whitespace is significant and signal that to the application.

>
>I have heard rumours that xml:space may some day be removed from the
>core XML spec and put into a separate "XML Conventions" spec -- that
>would be a very good idea.

We should be careful not to act on rumours on XML-DEV. There is a carefully
controlled process which requires discipline from those wishing to use XML.
Some of the deliberations are confidential (e.g. XML-SIG - and as a member
of that I cannot confirm or deny any speculations about what is discussed
there). XML relies on the community adhering to the spec as closely as they
can - this in itself is not easy.

OTOH I have publicly made it clear that I think that conventions are going
to be essential for the implementation of XML systems (and whitespace would
be a strong candidate).  This is why I have raised the idea of XDEV (an
informal set of conventions aired on the list) and shall continue to pursue
this. IFF the XML process formally wishes to set up a conventions WG or
similar I shall be very happy, but until they announce something like that
we cannot and should not assume it.

	P.

>
>
>All the best,
>
>
>David
>
>-- 
>David Megginson                 ak117@freenet.carleton.ca
>Microstar Software Ltd.         dmeggins@microstar.com
>      http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- SAX and whitespace (was Re: Problems with whitespace and msxml)
  - From: David Megginson <ak117@freenet.carleton.ca>

References:
- Problems with whitespace and msxml
  - From: David Megginson <ak117@freenet.carleton.ca>

Prev by Date: Problems with whitespace and msxml
Next by Date: SAX and whitespace (was Re: Problems with whitespace and msxml)
Previous by thread: Problems with whitespace and msxml
Next by thread: SAX and whitespace (was Re: Problems with whitespace and msxml)
Index(es):
- Date
- Thread