[
Lists Home |
Date Index |
Thread Index
]
Thomas B. Passin wrote:
> The only whitespace that can really be in question is whitespace between
> elements - that is, between the end of one element and the start of the
> next. Whitespace in element content is simply part of the content and
> is not to be removed. Whitespace is allowed between elements for visual
> formatting, and it is a sensible question to ask about how to tell when
> such whitespace is just for formatting and when it is actually supposed
> to be part of the content.
Indeed. In fact, with a bit of thought, you can construct any number of
corner cases where it's horribly non-obvious whether the whitespace
matters. Unfortunately, you can do this whether or not there's a DTD in
play. Don't know if schema solved this but I doubt it. The designers
of SGML worked really hard on writing all this down and giving rules,
and then the implementors disagreed on what they meant. The handling
of whitespace is highly application-specific, and xml:space is nothing
more than a signal of intent from upstream, that may freely be ignored
downstream.
The best rule, if you're generating XML, is not to put in any white
space that you don't mean. One common trick to make things totally
unamibiguous is what used to be called RAST format:
<html
><head
><title>the Title</title></head><body
><h1
>whatever</h1>
<p
>first para with no whitespace problems, with <a href="#foo"
>embedded link</a> you see how it goes?</p></body></html>
i.e. no newlines in content ever. This may be taking it a bit far, but
maybe not.
--
Cheers, Tim Bray
(ongoing fragmented essay: http://www.tbray.org/ongoing/)
|