[
Lists Home |
Date Index |
Thread Index
]
The only way to identify that whitespace is 'significant' is
1) look ahead, possibly all the way to the end element, to look for sibling text which is not whitespace. This is prohibitively expensive, since it may mean caching almost the entire document.
2) having a DTD/Schema which identifies the content as text or mixed.
Since (1) is prohibitively expensive, the only viable option is (2). And indeed, if you use the XmlValidatingReader and have a DTD/XSD which identifies the content as mixed, you will get back SignificantWhitespace, rather than just Whitespace. (I think... I looked at the code, but didn't actually code up a sample, and thus could be wrong. It is obvious that the intent of the code is to do this, so a failure to report SignificantWhitespace in such contexts is a bug.)
-derek
> -----Original Message-----
> From: Eckenberger Axel [mailto:Extern.Eckenberger@kmweg.de]
> Sent: Friday, July 04, 2003 12:10 AM
> To: Dare Obasanjo; xml-dev@lists.xml.org
> Cc: Scherbel, Michael
> Subject: RE: [xml-dev] Handling of significant whitespace in .NET
> XmlReader
>
> Dare,
>
> > Whitespace is made significant by the presences of an
> > xml:space="preserve".
>
> Thist only means that _all_ the whitespace for a given element is
> preserved,
> this is a workaround that I know of, and are actually using. However, even
> if this attribute is not set, semantically significant whitespace in mixed
> content must be preserved.
>
> <sampleData>
> <!DOCTYPE a [
> ....<!ENTITY uuml "ü">
> ]>
> <a>
> ....<b>aasdf...</b>
> ....<c>
> ........asdfasdf_dadf_.<e/>_asd_<d>asdfasd</d>_üadas
> <d>asdfasd</d>_<d>asdfas</d>_
> ........df_asd_.ü_asdf_asdf.
> ....</c>
> </a>
> </sampleData>
>
> In the above example I marked the whitespaces that in my oppinion should
> be
> preserved regardless of whether whitespace handling is set to preserve or
> not with a '.' and the whitespace that I think must be preserved in any
> case
> with a '_'.
>
> If the reader/parser removes the whitespaces marked with a '_', i.e. the
> semantically significant whitespaces the semantic of the document is
> changed.
>
> E.g., given the fragment
>
> <d>die</d> Überleitung
>
> The meaning of 'die Überleitung' (expected result) is different to
> 'dieÜberleitung' (result returned by XmlReader).
>
> So, I think there is a problem with the reader as it removes whitespace
> that
> is semantically significant, and this, at least as I read the spec, should
> not be done by an XML processor.
>
> Bye
>
> Axel
>
> <result whitespaceHandling="significant and none">
> [DocumentType] a ==> "
> <!ENTITY uuml "ü">
> "
> [Element] a ==> ""
> [Element] b ==> ""
> [Text] ==> "aasdf "
> [EndElement] b ==> ""
> [Element] c ==> ""
> [Text] ==> "
> asdfasdf dadf "
> [Element] e ==> ""
> [Text] ==> " asd "
> [Element] d ==> ""
> [Text] ==> "asdfasd"
> [EndElement] d ==> "" // missing
> whitespace node here
> [EntityReference] uuml ==> ""
> [Text] ==> "adas "
> [Element] d ==> ""
> [Text] ==> "asdfasd"
> [EndElement] d ==> "" // missing
> whitespace node here
> [Element] d ==> ""
> [Text] ==> "asdfas"
> [EndElement] d ==> ""
> [Text] ==> "
> df asd "
> [EntityReference] uuml ==> ""
> [Text] ==> " asdf asdf
> "
> [EndElement] c ==> ""
> [EndElement] a ==> ""
> </result>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
|