[
Lists Home |
Date Index |
Thread Index
]
Ronald Bourret wrote:
> This points out something that should be a requirement for binary XML:
> lossless roundtripping. In other words, you should be able to go from
> the text serialization to the binary serialization and back losslessly
> (within the confines of canonical XML). Same is true for binary <=>
> text, binary <=> binary, and (of course) text <=> text.
Of course text <=> text? This doesn't work today. I don't keep a list,
but off the top of my head. Information in the text such as character
references and internal general entity references in attribute values
are removed by parsers (e.g., SAX) and are not available to write back
out again. This is a perennial source of XSLT questions. Until SAX2
Extensions 1.1, SAX didn't report the xml declaration, so the
application didn't know the original encoding. The application couldn't
tell which attribute values were specified in the document and which
came from the DTD as defaults. As ERH points out, canonicalization loses
the DOCTYPE declaration. And so on.
It has taken many years and several iterations to get XML parsers to the
point where they are even close to supporting roundtripping. Imagine if
this had been a "requirement" for XML 1.0.
Bob Foster
|