OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Re: Where does the "nothing left but toolkits" mythcome fr

[ Lists Home | Date Index | Thread Index ]

Fair enough. I wasn't thinking at that level of round-tripping, which I 
agree is problematic. What worried me about ERH's example was the 
potential for not even being able to round-trip text -- an issue that 
hasn't come up before (modulo entity references).

The problem is not limited just to values, such as would occur with 
binary representations of real numbers. It also applies to formats. 
Dates and numbers have multiple formats, some of which may inadvertently 
carry information.

For example, French geneological data might represent dates from the 
Napoleonic period using the Napoleonic calendar; since this is how the 
data is originally recorded, it should probably be continued to be 
represented that way, even though these dates can be converted to modern 
date systems.

Similarly, a transcription of notes written by a criminal suspect might 
include dates in a particular format. Since this format might be a clue 
to the suspect's nationality or background, changing the format would 
mean losing information.

Obviously, this additional information could be represented by 
additional metadata. But it is naive to think that all document 
designers will add such metadata.

-- Ron

Bob Foster wrote:
> Ronald Bourret wrote:
>  > This points out something that should be a requirement for binary XML:
>  > lossless roundtripping. In other words, you should be able to go from
>  > the text serialization to the binary serialization and back losslessly
>  > (within the confines of canonical XML). Same is true for binary <=>
>  > text, binary <=> binary, and (of course) text <=> text.
> Of course text <=> text? This doesn't work today. I don't keep a list, 
> but off the top of my head. Information in the text such as character 
> references and internal general entity references in attribute values 
> are removed by parsers (e.g., SAX) and are not available to write back 
> out again. This is a perennial source of XSLT questions. Until SAX2 
> Extensions 1.1, SAX didn't report the xml declaration, so the 
> application didn't know the original encoding. The application couldn't 
> tell which attribute values were specified in the document and which 
> came from the DTD as defaults. As ERH points out, canonicalization loses 
> the DOCTYPE declaration. And so on.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS