OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Re: Where does the "nothing left but toolkits" myth come f

[ Lists Home | Date Index | Thread Index ]

> -----Original Message-----
> From: David Lyon [mailto:david.lyon@computergrid.net] 
> Sent: Monday, February 14, 2005 20:33
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Re: Where does the "nothing left but 
> toolkits" myth come from?
> On Sunday 13 February 2005 05:33 pm, Ronald Bourret wrote:
> > For example, French geneological data might represent dates from the
> > Napoleonic period using the Napoleonic calendar; since this 
> is how the
> > data is originally recorded, it should probably be continued to be
> > represented that way, even though these dates can be 
> converted to modern
> > date systems.
> >
> > Similarly, a transcription of notes written by a criminal 
> suspect might
> > include dates in a particular format. Since this format 
> might be a clue
> > to the suspect's nationality or background, changing the 
> format would
> > mean losing information.
> ditto for the Hebrew calendar... very old.. also some other 
> old ones, Chinese,
> Ethiopian and Indian calendars..
> There are just a few other ones to think about...

I donít understand this particular point of the discussion.  Why should a
Chinese or Hebrew date be converted to its Gregorian "equivalent" date when
using a binary format?

Note that Fast Infoset (ISO/IEC 24824-1, now past FCD) doesn't blindly
"convert" dates in this stupid way.  It doesnít blindly convert floats,
integers, etc., in this way either.  

In Fast Infoset, sequences of Unicode characters that signify (or just "look
like") dates and times *can* be represented in a compact form by using a
"restricted alphabet" consisting of the following 15 characters:


using four bits per character.  Actually, Fast Infoset knows nothing about
dates and times.  Any substring consisting only of characters from the set
above can be encoded as a "date".  There is no implication that the encoding
will represent a true date/time rather than a meaningless sequence of the
above characters.  So the "restricted alphabet" for dates and times is just
a convenience that doesn't have any semantic implication.  ("87771--ZT :-4
5131" is acceptable as input).

The same is true for integers, floats, etc.  **Fast Infoset provides
"encoding algorithms", not "types". **  The rule is that any sequence of
characters (an attribute value, or a substring of element content) that
happens to satisfy some specified criteria of composition, sequence, etc.,
can be encoded as a binary integer, IEEE float, etc.  **This is lossless in
terms of Unicode characters in the XML infoset.**  

Just to make it more clear, it is NOT the case that, having something that
resembles a floating point number in an XML document, you can infer its type
and just encode the float as a float.  You can do that if and only if the
character sequence satisfies the criteria specified in the standard.

The entire support for integers, floats, dates and times, base64, hex, etc.
in Fast Infoset can be regarded as a lossless compression method.  Again,
there are not "types" but "encoding formats", and all the encoding formats
are lossless (in terms of Unicode characters in the XML infoset).

There are at least two ways of looking at this stuff.  

The first is:  You have an XML document and want to convert it to Fast
Infoset.  In this case, you can apply encoding algorithms (as for
"integers") and restricted alphabets (as for "dates") wherever you wish,
provided that the character sequences match the specified criteria.  You
donít have to understand the "meaning" of the character sequences - the
criteria are purely **syntactical**.  If you apply the reverse conversion,
you get back exactly the same character sequences.

The second is:  You have some application data (including integers, floats,
etc.) in memory and want to create a fast infoset document (as an
alternative to an XML document).  In this case, you are the one who decides
the actual character sequences that would be included in the XML document
(how to exactly serialize your floats, integers, dates, and byte strings) if
you were creating an XML document.  For example, you have a float (3.5) and
must decide whether to serialize it as "3.5" or as "0.35" or as "35E-1" etc.
Since you can choose, you are free to choose the form that happens to match
the criteria for the "float" encoding algorithm.  If you choose that
particular form, you will be able to encode your character sequence using
the IEEE format in the fast infoset document.  

In practice, this conceptual process can be implemented by just inserting
the integers and the floats, so the application can be simplified and made
faster (it doesn't have to generate the characters).  However, this
practical shortcut does not affect the conceptual process.  The infoset
behind the fast infoset document will contain **characters**, not integers
and floats.

The fast infoset document does NOT augment the XML infoset in any way by
allowing the presence of integers, floats, and binary strings.  What it does
is provide encoding algorithms and restricted alphabets that support a
compact encoding of character strings that match some criteria.

Alessandro Triglia
OSS Nokalva

> David
> -- 
> Computergrid : The ones with the most connections win.
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS