OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb per

[ Lists Home | Date Index | Thread Index ]

Michael Champion wrote:

> Before XML (and related technologies) people had 
> little choice but to stick with rigid formats, because all hell would 
> break loose when they were changed.  People are jumping on XML and the 
> design philosophies it enables  because there has been a pent up demand 
> for more flexibility.

Well, CSV has been around for ages, and the way I've always written CSV 
parsers is to take the first line as a line of column headings, and use 
those to select what is done with each column's fields. And my software 
ignores fields it doesn't know a use for, and assumes that missing 
fields it expects have a NULL value, which may or may not cause 
higher-level code to reject the row.

Also, ASN.1 has an extension mechanism, where people using different 
variants of a 'schema' can still communicate; the decoder may inform the 
application that it had to discard some data it didn't understand, but 
still provides the fields that the decoder knows.

This 'flexibility' isn't something new to XML, it's inherent in any 
format that uses some kind of tagged values; including things like TIFF 
and PNG image files. And MIME headers, and SMTP email messages. Nothing 
special about XML in this respect! XML fans seem to have similar 
marketing ideas to Microsoft, picking up a good idea from elsewhere and 
claiming to have invented it ;-)

Ah, TIFF files! Whereas PNG files, which are well specified, are pretty 
damned interoperable now most browsers support them, TIFF files are a 
bit of a gamble. They have so many unconstrained options, due to lots of 
folks adding their own extensions here and there, that hardly any 
decoders seem to understand all the options - so although app A may 
export TIFF and app B may import it, that's no guarantee that you can 
actual transport an image that way. Even if B ignores elements it 
doesn't understand in the TIFF, it usually falls apart because the 
element it ignored was critical to decoding the image.

PNG files learnt from this; PNG chunks have a flag in them which 
indicates if they are necessary to decode the image (eg, a flag saying 
the image data is in Yxy instead of RGB) or if they are not (eg, extra 
annotations; thumbnail images, textual descriptions, information stating 
that the image has a scale whereby each pixel corresponds to a 1mmx1mm 
square of some surface, etc). The latter types of chunk also carry a 
flag saying if they should be discarded if the image data is changed; a 
thumbnail should, since it would be incorrect, but a textual note needn't.

This mechanism can only be made to work because there is one 'core' 
piece of information - the image data. In general, one would have to 
have each optional part of a file format list all the parts it depended 
upon for its meaning, and an algorithm to deduce the consequences of a 
change or not understanding a part.

I think agreed schemas will *increase* reliability of systems. The 
objection to this seems to be "Oh, so your system dies the second it 
sees somebody's private extension?" - which is in no way implied by 
schemas being agreed between the communicating parties. And for very 
large scale systems, that 'agreement' can be as simple as saying "This 
site publishes data in the format documented _here_"



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS