[
Lists Home |
Date Index |
Thread Index
]
Ralf wrote:
> ...
> This content contains both the xml user content (i.e. the "real"
> content) and the xml text file "format" content.
> My real question is how do I use that content? How do I design my XML?
> How do I know that the spacing matters, or that the \n matters or are
> just a byproduct of the XML text file?
> Is it best to remove all spacing and \n after and before any kind of
> non-whitespace content?
> Sure xml:space tells me I should preserve them or not... really? The
> specs doesn't tell me much about it. It just says the application gets
> it all or knows what to do with it. Well personnally I don't. What about
> "good behavior" guidelines for applications?
>
> Here I'm dealing with SVG (typically <text> and <tspan>) but I already
> had the problem with other custom xml formats.]
>
> A typical example that confuses me:
>
> <text> this
> is a text .
> </text>
>
> What should I interpret here? One straight line, one line with one \n in
> the middle, or one with thre \n ? What about the spaces before "is" and
> "this"?
The only whitespace that can really be in question is whitespace between
elements - that is, between the end of one element and the start of the
next. Whitespace in element content is simply part of the content and
is not to be removed. Whitespace is allowed between elements for visual
formatting, and it is a sensible question to ask about how to tell when
such whitespace is just for formatting and when it is actually supposed
to be part of the content.
That's one of the things that a DTD or schema is supposed to answer. Of
course, if there is no DTD there is no way to be certain (except for
xml:space). If an element allows mixed content, then any whitespace
between its child elements would have to be considered part of the
element's content. Conversely, if no PCDATA is allowed, then any
whitespace between elements would have to be for visual formatting only.
When there is a DTD or schema is involved, you only get this whitespace
knowledge passed on if the processor is set for validation.
You can tell the processor more specifically what you have in mind with
the xml:space attribute. Again, it applies just to whitespace between
the end of one element and the start of another.
If xml:space is not specified, you will generally get the processor's
default behavior, which is application-dependent.
To recap, all the PCDATA in your example "text" element should be handed
to your code as is, and ther is no need for you to try to figure out
what it "should" be like.
And, BTW, there is no "\n" in xml, which does not know about such idioms
from certain programming languages. In a processed XML document, you
will have characters where you originally had newlines.
Cheers,
Tom P
|