[
Lists Home |
Date Index |
Thread Index
]
- From: Peter Murray-Rust <peter@ursus.demon.co.uk>
- To: xml-dev@ic.ac.uk
- Date: Fri, 20 Feb 1998 22:34:32
I am considering how to treat xml:space in JUMBO and ask for help and
comments. <NOTE>I am NOT re-opening the whitespace debate; I am asking
those who understand xml:space if what I do/intend to do is reasonable.
xml:space is a formal part of the language and I feel I have to address
it.</NOTE>
1. Are there any documents which actually use xml:space? rec.xml does not
2. Is there anyone on this list intending to use it? If so, what do they
expect "applications' default white-space processing modes" to be?
[Quotations are from rec.xml]
>An XML processor must always pass all characters in a document that are
not >markup through to the application. A validating XML
>processor must also inform the application which of these characters
>constitute white space appearing in element content.
My philosophy in JUMBO (which is a generic application) is to accept all
whitespace from the parser/SAX, whether labelled 'ignorable' or not. All
PCDATA is stored in child nodes of elements. Those with ignorable
whitespace can be specially labelled. IOW I do not discard any character
data on input.
>
>A special attribute named xml:space may be attached to an element to
signal >an intention that in that element, white space should be
>preserved by applications. In valid documents, this attribute, like any
>other, must be declared if it is used. When declared, it must be
>given as an enumerated type whose only possible values are "default" and
>"preserve". For example:
>
> <!ATTLIST poem xml:space (default|preserve) 'preserve'>
>
OK. If xml:space="preserve" I have no problems.
If xml:space="default" I am asking for help. Note that xml:space="default"
could apply either to ignorable whitespace or non-ignorable w/s
If xml:space is absent, I suggest options below...
>
>The value "default" signals that applications' default white-space
>processing modes are acceptable for this element; the value
>"preserve" indicates the intent that applications preserve all the white
>space. This declared intent is considered to apply to all
>elements within the content of the element where it is specified, unless
This causes me slight concern. It means I have to write code that
automatically tracks what elements have an xml:space attribute. This is
possible, but yet another thing that has to be done. I might be motivated
to do it if I am shown some use for it...
>overriden with another instance of the xml:space attribute.
This means effectively that every node in a document has to have an
xml:space flag. [Unless this is dynamically worked out every time the
document is to be rendered.]
--------
Without xml:space, and without a DTD, I can see the following *generic*
possibilities:
- element is empty. [BTW the spec (and SAX) discards all knowledge of
whether this was created by <FOO></FOO> or <FOO/>. I approve of this.].
Children are not displayed because there aren't any
- element contains non-w/s characters. This is displayed as either as a
string or as a title-value pair (at user option). The title is determined
by simple heuristics.
- element contains element content. This is displayed as a tree. I am
considering also allowing the user to display this as a tagged/untagged
event stream, but the tree is the default.
- element contains element content and (some) non-w/s PCDATA children .
This is displayed as an untagged (or selectable) tagged event stream.
Unless the semantics of the tags are known or a stylesheet is provided, no
other rendering is possible.
Now the two w/s options...
- element contains element content and (only) w/s children. This is
displayed by default as ignoring the w/s. Note that this is *display*, not
processing. Since the default is a tree, the w/s nodes aren't much use.
- element contains a single w/s child. This does not display anything by
default.
The user can switch to display/hide PCDATA children in the tree display.
For *outputting* it is possible to delete the w/s nodes if required. Once
deleted they are gone ...
I would be interested in comments as to whether this is reasonable default
behaviour or whether there are other things that should be considered.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|