OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Guidelines for handling of elements' content?

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: Guidelines for handling of elements' content?
  • From: Ralf <ralfml@alfray.com>
  • Date: Tue, 23 Sep 2003 22:29:09 -0700
  • Organization: Alfray Inc.
  • User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624


Disclaimer: this is a newbie question. Please email me offlist if my 
question is best answered somewhere else. Flames burn. Thanks in advance.

I'm parsing a SVG file using libxml2 and for each element node I get its 
content. [Note: this is not an SVG nor libxml2 specific question]
This turns out to be a concatenation of all the "text" located between 
the node' list of children, recursively, including spaces and end of 
lines. Fine, that's what the XML specs says will happen (not that I 
understand the logic of it).

This content contains both the xml user content (i.e. the "real" 
content) and the xml text file "format" content.
My real question is how do I use that content? How do I design my XML?
How do I know that the spacing matters, or that the \n matters or are 
just a byproduct of the XML text file?
Is it best to remove all spacing and \n after and before any kind of 
non-whitespace content?
Sure xml:space tells me I should preserve them or not... really? The 
specs doesn't tell me much about it. It just says the application gets 
it all or knows what to do with it. Well personnally I don't. What about 
"good behavior" guidelines for applications?

Here I'm dealing with SVG (typically <text> and <tspan>) but I already 
had the problem with other custom xml formats.]

A typical example that confuses me:

<text>    this
   is a text   .

What should I interpret here? One straight line, one line with one \n in 
the middle, or one with thre \n ? What about the spaces before "is" and 
Currently what I do is ignore any whitespace (space, \t and \n) before 
and after the non-whitespace content. Everything inside I keep.

FAQs pointers or other welcome offlist.

Thanks in advance.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS