[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Re: why whitespace counts as a node?
- From: Mukul Gandhi <gandhi.mukul@gmail.com>
- To: xml-dev@lists.xml.org
- Date: Sun, 14 Nov 2010 21:01:54 +0530
I think the issue of treating white-spaces in XML documents get's
interesting when XML documents are validated by XML schema's.
Here are the various cases I can think of (with significance to white-spaces) :
1) If the XML document is parsed by a SAX parser, then the call-back
method "characters" (which get's notification of character data) will
get all the characters in character data (including the white spaces).
When XML documents are parsed by a DOM parser, text nodes still
contains all white-space contents.
Therefore XML parsing preserves white-space contents in the infoset
instance the parsing process produces. I think this is desirable in
plain XML parsing process, since applications may want to do something
with white-spaces too.
2) Things get little interesting when XML documents are validated by
say XML schema documents. Here are few examples:
a)
<x>
100
</x>
Here the content of element "x" is numeric, but there are boundary
white-spaces around the numeric value 100.
This will be successfully validated by the following XML schema fragment,
<xs:element name="x" type="xs:integer" />
b)
<x>
hello world
</x>
Here there are boundary white-spaces within element "x".
c)
<x>hello world</x>
Here there are no boundary white-spaces within "x".
The following XML schema fragment,
<xs:element name="x">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="11" />
</xs:restriction>
</xs:simpleType>
</xs:element>
would report XML document (b) as invalid while (c) as valid. This is
because with the schema type xs:string, white-space contents in XML
documents are considered significant (and that effects validity of
character content), while with numeric types such as xs:integer
white-spaces are not considered significant (and that's ignored by say
an XML schema validator).
On Sun, Nov 14, 2010 at 6:41 PM, Michael Kay <mike@saxonica.com> wrote:
>
>> Ok, so it does serve a purpose. However, even in xhtml, if you want
>> white space in a paragraph of text, then you can put that whitespace
>> between tags. I'm sure it's my lack of experience, but, for example,
>> when do you need that white space?
>>
> Once you accept the usefulness of inline markup like this:
>
> <p>I just <i>love</i> <place>London</place></p>
>
> then you have to accept that the space between "love" and "London" is just
> as significant as the one between "I" and "just".
>
> Some of the XML specs do try and recognize that whitespace in mixed content
> needs to be treated differently from whitespace in "element-only content"
> (like database dumps). But part of the XML philosphy is that XML instances
> can be used without having a schema or DTD, which means you don't always
> know whether it's mixed content or not. So you have to treat it as
> significant.
>
> This is one of the reasons it's best to avoid "non-standard" uses of mixed
> content like this:
>
> <date-of-birth>
> <source>birth-certificate</source>
> 1920-03-04
> </date-of-birth>
>
> Michael Kay
> Saxonica
--
Regards,
Mukul Gandhi
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]