OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Attribute-Value Normalization problem



I have encountered a problem regarding the Attribute-Value Normalization.
I have the following XML(as an examle):
<test Attr="&#x09;">some text</test>

I need to construct a DOM from it and then write it back to a
file repeatedly. During each cycle, I would generate a DOM-Hash Digest of
the document and compare the new digest with the digest from the last
cycle (to make sure that the document is not changed). 

The Attribute-Value Normalization specification (as in XML Spec 1.0
Section 3.3.3) treats a Character Reference Differently from other entity
references (not recursively processed), which gives me much grief. 

The first time I process the document, &#x09; is replaced by a TAB
character. After I generate digest, I write it back to a file.  However,
the second time I process it again from the file I just wrote, the
TAB character is replaced by a SPACE character. The new digest based on
this DOM no longer matches the old one, though there is no actual changes
to the file.

Is there any easy way (without always process twice before trusting the
results) to circumvent this? My further question is: Why does the spec
treats Character References differently? Why can't we also recursively
normalize Character References?

Thanks.

-JJ