[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Blueberry
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- To: David Carlisle <davidc@nag.co.uk>, xml-dev@lists.xml.org
- Date: Fri, 22 Jun 2001 09:05:02 -0400
At 10:17 AM +0100 6/22/01, David Carlisle wrote:
>Do any files really use NEL that are encoded in utf-8 or utf-16 (or
>utf-8 subsets like ascii that don't need to be declared)?
>
>If all the files using NEL start
><?xml version="1.0" encoding="some-flavour-of-ebcdic"?>
>Then can't NEL be mapped to #10 (0r #13) in the non normative support
>for the ebcdic related encodings. This wouldn't require any change to XML.
>
This is a good idea. Maybe we can fix this part of the problem in the
context of XML 1.0 without changing the spec. We'd need to define a
new encoding of Unicode such as IBD-8. IBD-8 would be identical to
UTF-8 except that normal UTF-8 representation of the NEL character
would be mapped to the linefeed. Parsers would have the option to
support or not support IBD-8 at their option, just like today they
have the option whether or not to support all of IBM's various EBCDIC
encodings.
XML aware tools would not need to be changed at all, especially if
they don't want to support the new encoding. XML aware tools that did
support the IBD-8 encoding would treat it like any other XML
document. Non-XML-aware tools on IBM mainframes (e.g. text editors,
println()-like methods in programming languages, etc.) would be able
to work with the files in a natural native way. Non-IBD-8 aware text
tools on other platforms would probably choke, but they do that
anyway today when faced with strange encodings. On the other hand,
UTF-8 savvy, non-XML-aware tools could still process these documents
as they usually do.
And of course if UTF-8 isn't the variant that IBM wants, they can
have IBD-16 (UTF-16), IBD4 (UCS4) etc. The encodings would be
identical except that XML-aware tools would either translate the NEL
characters to linefeeds or throw an error because they don't
recognize the encoding. I think this might make everyone happy. Does
anyone see a problem with this?
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+