[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: Guidelines for handling of elements' content?
- From: Ralf <ralfml@alfray.com>
- Date: Tue, 23 Sep 2003 22:29:09 -0700
- Organization: Alfray Inc.
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624
Hi!
Disclaimer: this is a newbie question. Please email me offlist if my
question is best answered somewhere else. Flames burn. Thanks in advance.
I'm parsing a SVG file using libxml2 and for each element node I get its
content. [Note: this is not an SVG nor libxml2 specific question]
This turns out to be a concatenation of all the "text" located between
the node' list of children, recursively, including spaces and end of
lines. Fine, that's what the XML specs says will happen (not that I
understand the logic of it).
This content contains both the xml user content (i.e. the "real"
content) and the xml text file "format" content.
My real question is how do I use that content? How do I design my XML?
How do I know that the spacing matters, or that the \n matters or are
just a byproduct of the XML text file?
Is it best to remove all spacing and \n after and before any kind of
non-whitespace content?
Sure xml:space tells me I should preserve them or not... really? The
specs doesn't tell me much about it. It just says the application gets
it all or knows what to do with it. Well personnally I don't. What about
"good behavior" guidelines for applications?
Here I'm dealing with SVG (typically <text> and <tspan>) but I already
had the problem with other custom xml formats.]
A typical example that confuses me:
<text> this
is a text .
</text>
What should I interpret here? One straight line, one line with one \n in
the middle, or one with thre \n ? What about the spaces before "is" and
"this"?
Currently what I do is ignore any whitespace (space, \t and \n) before
and after the non-whitespace content. Everything inside I keep.
FAQs pointers or other welcome offlist.
Thanks in advance.
Raphael
|