XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] "XML for the Long Haul" program available

> > isn't there a more fundamental issue: will there be any tools that
understand the
> > encoding used by today's computers? Will UTF-8 still exist 200 years
from now? Will
> > there be tools that can interpret UTF-8 200 years from now?

UTF-8 is just one layer in the complete code hierarchy that the future
researchers/archaeologists would have to decipher. With the digital
media, it is a complete Babylon tower that needs to be understood
bottom-up: first you need to know that the information is represented
using 0 and 1 bits, then you need to understand how the bits are stored
in the actual physical substrate. Then you need to know that bits are
usually organized in 8-bit sequences that represent numbers in binary
numeral system. (You should probably also be aware that there is
something called a 'file', and that there are different file systems
etc...) Then you have the little- big-endian business, and then actual
character encodings such as UTF-8. To decipher such encodings, you will
often need code tables to be able to map the character codes to actual
characters. Only after that comes XML with its opening tags, closing
tags, attributes, and namespaces. But that is already relatively easy,
because at this point, you should see text that you can more or less
understand.

Taking the steps above together, I wouldn't be surprised if retrieving
information from a simple text file stored on today's media represented
a challenge orders of magnitude harder than, say, deciphering with the
Enigma code.

Of course, when talking about hundreds of years, information stored on
any digital media of these days will probably be inaccessible anyway,
either simply because of the short life-span of the media, or because
today's technology will likely be too antique or incompatible with the
technology of 100 years ahead (Have you tried to attach an old 8 inch
external floppy drive - itself only 40 years old - to your PC? And who
says that the future technology will be digital anyway?)

> As a backup, why don't we just print it out (acid-free paper of
course!)?  Or maybe
> just the important stuff (NSA may be interested in every XML message
ever sent over
> the net, but I doubt anyone else will be).  XML's supposed to be text,
after all.

Paper would be one option, or stone, if you are concerned about a really
long preservation perspective. On the other hand, none of these media
(unless you go to the micro/nano scale, in which case you would be
increasing retrieval/decoding complexity again) is particularly well
suited for large volumes of information or processing ('How do you
XQuery a stone?').

Regards,
Vojtech

--
Vojtech Toman
Principal Software Engineer
EMC Corporation
toman_vojtech@emc.com
http://developer.emc.com/xmltech


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS