[
Lists Home |
Date Index |
Thread Index
]
At 7:57 PM -0700 12/14/01, Champion, Mike wrote:
>I'm out of my depth here, but this argument doesn't smell right to me. I
>thought we concluded in the massive Blueberry thread a few months back that
>#x85 probably should have been included in the S production in the first
>place, and wasn't mainly because of a lack of mainframe expertise among the
>members of the original WG.
No, we didn't conclude that. A lot of us thought then and still think
that XML 1.0 got this right, that #x85 should not have been part of
the S production and still shouldn't be.
>pragmatism and leave them out. BUT there is an IMMENSE amount of data in
>mainframe databases that will probably be exposed via XML one day. It's not
>IBM that will pay the cost of debugging all the programs that neglect to
>translate #x85 into a politically correct separator when exposing these
>legacy systems as web services. And it is potentially OUR bank accounts and
>insurance policies in these legacy systems that are vulnerable to someone
>getting this wrong.
>
And exactly *none* of this data is in XML. If you want to take it out
of the database and put it in XML, then it must be translated with or
without XML 1.1. The same is true of Oracle, FileMaker, SQL: Server,
and all other legacy database products on the market. It is trivial
to translate #x85 to #xA or #xD or both in the process. However, even
that isn't necessary!
#x85 is allowed in character data; i.e. in element content and
attribute nodes, today, with XML 1.0. All fields from IBM's databases
that contain #x85 characters can be included in XML 1.0 documents
without translations. The only place you can't put #x85 is in tags
between element names and attributes and attributes and other
attributes.
The issue is not IBM databases and never has been. The issue is that
IBM has some brain damaged text editors that insert a #x85 every time
you hit the return key instead of inserting a #xA or #xD or both.
Files created with these editors are not well-formed XML without an
additional conversion pass. Similarly, IBM has some programming
languages and tools that generate a #x85 when they do a println() or
that language's equivalent. That's all.
This has nothing to do with letting data move from IBM databases into
XML. It has everything to do with IBM not wanting to update their
software to the standards the rest of the world has been using for
more than 20 years. Worst of all, IBM wants to start shipping around
XML documents they generate with these strange line ending characters
that will not behave appropriately in the installed base of software
the rest of the world is using. I'm not just talking about XML here,
but much more broadly installed things like text editors and
programming languages. For instance, suppose an IBM tool generates a
start-tag like this using #x85:
<name
att1="value"
att2="value"
att3="value"
>
Looks like well-formed ASCII right? But it's not. Here's what you'll
see if you open up the document containing that tag on a typical
Windows text editor:
<name... att1="value"... att2="value"... att3="value"...>
(Actual ellipsis characters will be used instead of three periods,
but you get the idea.) Open it on a Mac and all the ellipses will
change into O with two dots above instead.
This isn't just a question of recognizing the right encoding. It's a
question of attaching the right semantics to the characters. #x85
isn't just another character. It's a character with special meaning
for many text-processing systems. Unfortunately IBM has chosen to
assign different semantics to this character than pretty much
everyone else in the world. Even if the document is labeled as
ISO-8859-1 and the editor recognizes that and can tell that #x85 is
not a graphics character, it still won't break the lines when it sees
#x85!
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
| http://www.ibiblio.org/xml/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+----------------------------------+---------------------------------+
|