OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Why you should avoid Notation Declarations (by Kohsuke

[ Lists Home | Date Index | Thread Index ]

On Wed, Feb 23, 2005 at 07:53:48AM -0500, Elliotte Harold wrote:
> Henry S. Thompson wrote:
> >You have a place in your XML document type for (e.g. hex-encoded)
> >BLOBs to be included or referenced.  Each BLOB in turn should be an
> >encoding of something using one of a delimited set of more-or-less
> >standard encodings.  You want to allow or require document authors to
> >signal which encoding they're using.
> How do notations improve on, for example, <data mime:type="image/jpeg" 
> href="somefile.jpg"/>? Normally I like additional layers of indirection, 
> but the NOTATION indirection has never seemed all that useful. What am I 
> missing?

You're not missing very much.

Worse, in the Web environment for which we tried to develop XML,
it's not up to the client, nor to the document author, what
formats are available for the representation of a given resource
identified by URI.

In English :-) that means that if you try and fetch
the remote Web server is *perfectly* at liberty to send you
a JFIF/JPEG file, a PNG file, or even a text document.

It does this based on a combination of the Accept header given
by the user agent (e.g. Web browser) and what formats the server
has available.  And yes, common Web servers in widesperad use do
actually do this (e.g. Apache).

So a document author has no business *whatsoever* saying what
format should be used for an image unless they know exactly how
their document will be processed.  Which seems to me very much
against the spirit of XML.

There *is* a case in which NOTATION makes some sense -- it's when
the content of an element is in an XML-comaptible text-based
format other than XML.  You can use a CDATA section but the
character gmut must be those Unicode characters permitted to
be used within an XML document (or a subset, of course).
Hence, you can't just put an image there.

W3C Schema Notation goes further, and you can base64-encode
the content of an element, to circumvent the character set issue
and remove the need to use cdata sections.

In this circumstnce, the notation is signalling to the XML processor
(post-schema at least) that the content of the element is in a
constrained format.  Presumably it could unpack and test, and if
(for example) the JPEG image was found to be non-conforming with
the corresponding ISO document, one can imagine rejecting the document,
although I don't know if any processors do that automatically
beneath the Schema level, nor if Schema explicitly sanctions this,
nor if I'd always want such behaviour!

Personally I'd be 100% happy about removing DTD-style notations,
which are pretty much compleetly broken, and I don't have a strong
opinion on Schema notations except to say that I think that since
not aligned with DTD notations they could perhaps more usefully
have been given a different name, e.g. EncodedContent, and should
still be restricted to element or attribute content... external
unparsed entities, and the equivalent with XInclude, have no
business (in general) trying to pontificate about what file format
is to be used when in general they won't be correct.  A hint
that's useful in a closed environment is no more than a hint,
and is a higher-level convention.


Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS