[
Lists Home |
Date Index |
Thread Index
]
Dare,
> Whitespace is made significant by the presences of an
> xml:space="preserve".
Thist only means that _all_ the whitespace for a given element is preserved,
this is a workaround that I know of, and are actually using. However, even
if this attribute is not set, semantically significant whitespace in mixed
content must be preserved.
<sampleData>
<!DOCTYPE a [
....<!ENTITY uuml "ü">
]>
<a>
....<b>aasdf...</b>
....<c>
........asdfasdf_dadf_.<e/>_asd_<d>asdfasd</d>_üadas
<d>asdfasd</d>_<d>asdfas</d>_
........df_asd_.ü_asdf_asdf.
....</c>
</a>
</sampleData>
In the above example I marked the whitespaces that in my oppinion should be
preserved regardless of whether whitespace handling is set to preserve or
not with a '.' and the whitespace that I think must be preserved in any case
with a '_'.
If the reader/parser removes the whitespaces marked with a '_', i.e. the
semantically significant whitespaces the semantic of the document is
changed.
E.g., given the fragment
<d>die</d> Überleitung
The meaning of 'die Überleitung' (expected result) is different to
'dieÜberleitung' (result returned by XmlReader).
So, I think there is a problem with the reader as it removes whitespace that
is semantically significant, and this, at least as I read the spec, should
not be done by an XML processor.
Bye
Axel
<result whitespaceHandling="significant and none">
[DocumentType] a ==> "
<!ENTITY uuml "ü">
"
[Element] a ==> ""
[Element] b ==> ""
[Text] ==> "aasdf "
[EndElement] b ==> ""
[Element] c ==> ""
[Text] ==> "
asdfasdf dadf "
[Element] e ==> ""
[Text] ==> " asd "
[Element] d ==> ""
[Text] ==> "asdfasd"
[EndElement] d ==> "" // missing
whitespace node here
[EntityReference] uuml ==> ""
[Text] ==> "adas "
[Element] d ==> ""
[Text] ==> "asdfasd"
[EndElement] d ==> "" // missing
whitespace node here
[Element] d ==> ""
[Text] ==> "asdfas"
[EndElement] d ==> ""
[Text] ==> "
df asd "
[EntityReference] uuml ==> ""
[Text] ==> " asdf asdf
"
[EndElement] c ==> ""
[EndElement] a ==> ""
</result>
|