Lists Home |
Date Index |
John Cowan wrote
> This is a request for comment from this mailing list (or anyone else)
> on a proposal by Shigemichi Yazawa for a standard representation for
> the Unicode control characters that are not legal in XML 1.0. See
> In essence, this provides an element "<xml:orphanedChar value="#x0001">"
> which can be used *by convention* in place of an actual (and illegal) #x1
> character. The Infoset would view this as an element, not a character;
I'm not too keen on this proposal, even though it does have some merit. The
idea here is to represent a character using an entirely different infoset
item. This hack enables the application to by-pass the xml character rules
but it is nevertheless a hack. I see no reason why this should be adopted
as part of the XML recommendation - if individual applications wish to
obfuscate control characters in this way there is nothing stopping them from
doing so in XML 1.0.
In addition, I'm not happy with the way this proposal creates a distinction
between attribute values and element content. Many people draw little
distinction between the two, with the obvious exception that attribute
values do not have structured content.
I think that a W3C recommendation should create a mechanism that is suitable
for both element content and attribute values. Furthermore I would like to
see existing mechanisms used where possible. Has the Core WG exhausted all
possibilities surrounding the idea of using character references (eg &x05;)
I presume the idea has been dropped because of the need to protect existing
applications that cannot handle control characters. But if you leave out
0x00 (which has well-known mishevious properties) then I think most
applications will be able to handle the other characters without problem.
Additionally, should the few XML 1.0 applications that would stumble on
control characters in text be permitted to block the progress of XML?
Surely, if a class of application exists which requires the new features of
XML 1.0, but cannot handle control characters, then it could be catered-for
independently by parser vendors (perhaps by making a processing switch
available to treat control characters.as not well-formed even in 1.1
Please, let's not allow blatant hacks into the XML recommendations.
> An alternative proposal is to use a processing instruction such as
> "<?xmlchar #x1?>", which would be invisible to schemas.