Lists Home |
Date Index |
This is a request for comment from this mailing list (or anyone else)
on a proposal by Shigemichi Yazawa for a standard representation for
the Unicode control characters that are not legal in XML 1.0. See
In essence, this provides an element "<xml:orphanedChar value="#x0001">"
which can be used *by convention* in place of an actual (and illegal) #x1
character. The Infoset would view this as an element, not a character; it
would not be usable in attribute values; it is not fully general-purpose.
It would also require explicit declaration in schema languages, unless
they were modified to ignore it; even then, an element with an XSD
datatype would not be able to use this feature.
An alternative proposal is to use a processing instruction such as
"<?xmlchar #x1?>", which would be invisible to schemas. A little *too*
invisible, in some cases: it would be legal in simple datatypes, but a
string-typed element containing 3 characters could not contain 3 control
characters and still be schema valid.
The idea is certainly a hack. However, it may meet the use case
of people who wish to incorporate arbitrary Unicode strings into
XML character content by providing something that may meet the 80/20
requirement. Whether it *does* meet the 80/20 requirement is what we
chiefly want to know. Please make sure that all comments are cc-ed to
John Cowan <firstname.lastname@example.org> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_