OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Guidelines for handling of elements' content?

[ Lists Home | Date Index | Thread Index ]

Ralf wrote:
> ...
> This content contains both the xml user content (i.e. the "real" 
> content) and the xml text file "format" content.
> My real question is how do I use that content? How do I design my XML?
> How do I know that the spacing matters, or that the \n matters or are 
> just a byproduct of the XML text file?
> Is it best to remove all spacing and \n after and before any kind of 
> non-whitespace content?
> Sure xml:space tells me I should preserve them or not... really? The 
> specs doesn't tell me much about it. It just says the application gets 
> it all or knows what to do with it. Well personnally I don't. What about 
> "good behavior" guidelines for applications?
> Here I'm dealing with SVG (typically <text> and <tspan>) but I already 
> had the problem with other custom xml formats.]
> A typical example that confuses me:
> <text>    this
>   is a text   .
>     </text>
> What should I interpret here? One straight line, one line with one \n in 
> the middle, or one with thre \n ? What about the spaces before "is" and 
> "this"?

The only whitespace that can really be in question is whitespace between 
elements - that is, between the end of one element and the start of the 
next.  Whitespace in element content is simply part of the content and 
is not to be removed.  Whitespace is allowed between elements for visual 
formatting, and it is a sensible question to ask about how to tell when 
such whitespace is just for formatting and when it is actually supposed 
to be part of the content.

That's one of the things that a DTD or schema is supposed to answer.  Of 
course, if there is no DTD there is no way to be certain (except for 
xml:space).  If an element allows mixed content, then any whitespace 
between its child elements would have to be considered part of the 
element's content.  Conversely, if no PCDATA is allowed, then any 
whitespace between elements would have to be for visual formatting only.

When there is a DTD or schema is involved, you only get this whitespace 
knowledge passed on if the processor is set for validation.

You can tell the processor more specifically what you have in mind with 
the xml:space attribute.  Again, it applies just to whitespace between 
the end of one element and the start of another.

If xml:space is not specified, you will generally get the processor's 
default behavior, which is application-dependent.

To recap, all the PCDATA in your example "text" element should be handed 
to your code as is, and ther is no need for you to try to figure out 
what it "should" be like.

And, BTW, there is no "\n" in xml, which does not know about such idioms 
from certain programming languages.  In a processed XML document, you 
will have &#10; characters where you originally had newlines.


Tom P


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS