OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry



At 10:17 AM +0100 6/22/01, David Carlisle wrote:

>Do any files really use NEL that are encoded in utf-8 or utf-16 (or
>utf-8 subsets like ascii that don't need to be declared)?
>
>If all the files using NEL start
><?xml version="1.0" encoding="some-flavour-of-ebcdic"?>
>Then can't NEL be mapped to #10 (0r #13) in the non normative support
>for the ebcdic related encodings. This wouldn't require any change to XML.
>

This is a good idea. Maybe we can fix this part of the problem in the 
context of XML 1.0 without changing the spec. We'd need to define a 
new encoding of Unicode such as IBD-8. IBD-8 would be identical to 
UTF-8 except that normal UTF-8 representation of the NEL character 
would be mapped to the linefeed. Parsers would have the option to 
support or not support IBD-8 at their option, just like today they 
have the option whether or not to support all of IBM's various EBCDIC 
encodings.

XML aware tools would not need to be changed at all, especially if 
they don't want to support the new encoding. XML aware tools that did 
support the IBD-8 encoding would treat it like any other XML 
document. Non-XML-aware tools on IBM mainframes (e.g. text editors, 
println()-like methods in programming languages, etc.) would be able 
to work with the files in a natural native way. Non-IBD-8 aware text 
tools on other platforms would probably choke, but they do that 
anyway today when faced with strange encodings. On the other hand, 
UTF-8 savvy, non-XML-aware tools could still process these documents 
as they usually do.

And of course if UTF-8 isn't the variant that IBM wants, they can 
have IBD-16 (UTF-16), IBD4 (UCS4) etc. The encodings would be 
identical except that XML-aware tools would either translate the NEL 
characters to linefeeds or throw an error because they don't 
recognize the encoding. I think this might make everyone happy. Does 
anyone see a problem with this?



-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+