xml-dev - Some thoughts about the compatibility between XML 1.0 and XML 1.1

Some thoughts about the compatibility between XML 1.0 and XML 1.1

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Some thoughts about the compatibility between XML 1.0 and XML 1.1
From: Eric van der Vlist <vdv@dyomedea.com>
Date: Sat, 15 Dec 2001 17:48:44 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.6) Gecko/20011202

I don't feel like entering into the arena of discussing the need for the 
modification proposed by the first XML 1.1 WG since I don't feel 
qualified to speak about a problem which I have never personaly felt.

I would rather note that it can be an opportunity to test the versioning 
of XML on a limited change and that there is probably lots of things to 
learn from this first version change.

Let's first list all the impacts on applications using XML:

a) Some documents which are well formed per XML 1.0 may not be well 
formed per XML 1.1 (as far as I can tell):

per http://www.w3.org/TR/2001/WD-xml11-20011213/#sec2.13 :

"2.13 W3C Normalization Checking [NEW]

XML processors must/should/may check whether their input documents are 
in W3C normalized form, as defined by [Charmod]. XML processors must not 
transform the input to be in normalized form. It is a fatal 
error/error/not an error for the document not to be in normalized form."

and http://www.w3.org/TR/charmod/#sec-TextNormalization gives an example 
of non normalized yet XML 1.0 valid snippet ()

b) Some documents which are not well formed per XML 1.1 may not be well 
formed per XML 1.1 (this is the already well discussed consequence of 
allowing more characters in names).

c) The "same" text within an element of a XML file may be different if 
the file is a XML 1.0 or XML 1.1 document (since the EOL handling has 
been changed).

d) The "same" attribute value in a XML file may be different if the file 
is a XML 1.0 or XML 1.1 document (since the attribute value 
normalization has been changed).

Note: the WG also says that "each entity, including the document entity, 
can be separately declared as XML 1.0 or XML 1.1." which *seems* to 
allow, in a same document, to mix elements and attributes with both the 
new and the old EOL and attribute value handling and this seems like a 
weird thing to do.

Having listed these 4 differences, I'd like to assert that most of the 
XML 1.0 well formed documents are XML 1.1 well formed and that I would 
expect that for a while, most of the XML 1.1 well formed documents will 
also be XML 1.0 well formed (people will probably use the new version 
number for their new documents even if they don't use extended names).

Let's now have a look at what the WD says about the versioning:

http://www.w3.org/TR/2001/WD-xml11-20011213/#sec2.8

"2.8 Prolog and Document Type Declaration

Change "1.0" everywhere to "1.1"

Add the following paragraph:

XML 1.1 processors should accept XML 1.0 documents as well. If a 
document is well-formed or valid XML 1.0, it may be made well-formed or 
valid XML 1.1 respectively simply by changing the version number."

Per (a), I *think* that the above statement is not true.

I also think that it's far from being sufficient and that using a 
version 1.1 is serving two different purposes:

e) declare that the names may contain a bunch of new characters (note 
that it's a "may", not a "must").

f) specify that the parser must use the new EOL and attribute value 
handling methods.

For these two purposes, it seems to me that it would be very useful to 
let applications overide the version definition found in an instance 
document (exactly like it is useful to be able to overide a schema 
location) and let them say: process this 1.1 document as 1.0 (report 
errors if it's not 1.0 well formed and use the "old" handling), or: 
process this 1.0 document as 1.1 (report errors if it's not 1.1 well 
formed and use the "new" handling).

I have then looked at a random set of specifications which could be 
affected by the change.

John being an editor of both XML 1.1 and the XML infoset, no surprise 
with the infoset which should not be impacted.

C14N contains a list of whitespaces "whitespace characters #x9, #xA, and 
#xD" which would need to be updated.

XSLT contains also a list of whitespaces but XSLT would be more affected 
than that: the version can be specified in its XML output method, and 
the transformations to apply when the version of the XSLT stylesheet 
and|or a source document is different from the version of the output 
document: what should a XSLT processor do when an element or attribute 
name which 1.1 well formed but not XML 1.0 well formed is inserted in 
the output tree serialized by a XML 1.0 method? or when a text which is 
XML 1.0 well formed but not XML 1.1 well formed is inserted in an output 
tree processed by a XML 1.1 output method.

The XPath specification has been wise enough to reference the whitespace 
definition of XML rather than redefining it. However, if a XPath 
processor wanted to give a different result for the normalize-space() 
function depending on the version of XML which is used, it would surely 
be a problem for it since reporting the XML version to applications is a 
feature which is being introduced in DOM Level 3 and is missing from 
both SAX 2.0 and DOM Level2 (how can the XPath processor guess the 
version of the document, then?).

Finally, W3C XML Schema is also defining the list of whitespaces. Beyond 
the editorial change, changing the list would have strange effects 
similar to those mentioned for XSLT.

The effect of a number of facets would be modified (facets working on 
XML 1.O documents may not work on XML 1.1 documents and vice versa, 
enumerations could be affected, the length of strings would give 
different results, ...). The effect of derivations by lists would also 
be affected (a list element in a XML 1.0 document might for instance be 
considered as two different lists elements in XML 1.1).

Even the content models would be affected (content models considered as 
complex by XML 1.1 would be considered as mixed by XML 1.0).

I am sure that the lists (of affected specifications and of effects on 
the ones I have mentioned) are much longer, but I have thought that it 
might be usefull to give this partial list to illustrate what I meant!

Hope this helps.

Eric
-- 
Rendez-vous a Paris pour les Electronic Business Days 2002.
                                   http://www.edifrance.org/ebd/index.htm
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------

Prev by Date: RE: [xml-dev] XML and mainframes, yet again (was RE: [xml-dev] Some comments on the 1.1 draft)
Next by Date: Re: [xml-dev] XML and mainframes, yet again
Previous by thread: RE: [xml-dev] XML and mainframes, yet again (was RE: [xml-dev] Some comments on the 1.1 draft)
Next by thread: RE: [xml-dev] XML and mainframes, yet again (was RE: [xml-dev] So me comments on the 1.1 draft)
Index(es):
- Date
- Thread