[
Lists Home |
Date Index |
Thread Index
]
> #x85 is allowed in character data; i.e. in element content and
> attribute nodes, today, with XML 1.0. All fields from IBM's databases
> that contain #x85 characters can be included in XML 1.0 documents
> without translations. The only place you can't put #x85 is in tags
> between element names and attributes and attributes and other
> attributes.
And in "ignorable whitespace", and in the invisible whitespace that
exists outside the document body (prologue including DTD, epilogue).
> This has nothing to do with letting data move from IBM databases into
> XML. It has everything to do with IBM not wanting to update their
> software to the standards the rest of the world has been using for
> more than 20 years. ...
What, and forgo the profits of that locked-in customer base?
Surely other opportunities will exist, whereby the rest of the world
can yet be made to dance to IBM's tune ... :)
> It's a
> question of attaching the right semantics to the characters. #x85
> isn't just another character. It's a character with special meaning
> for many text-processing systems. Unfortunately IBM has chosen to
> assign different semantics to this character than pretty much
> everyone else in the world.
Good point -- I don't think that's been mentioned before. Of course,
the issue of whether the C1 control characters (U+0080..U+009F)
should ever have been allowed in XML has been raised often.
It's good to remember that one reason they're a problem is that they
have become a storehouse for vendor-proprietary characters, with
as many different meanings as most C0 ones (U+0000..U+001F).
Blessing one vendor's solution may magnify the problems.
- Dave
|