[
Lists Home |
Date Index |
Thread Index
]
At 9:06 AM -0400 7/26/02, John Cowan wrote:
>For that matter, the Java situation is not open and shut either.
>Although in Java it is guaranteed that '\n' == '\013', which is not
>guaranteed in C, the specific encoding employed by PrintStream to print
>characters is explicitly platform-specific, and it is not unreasonable
>for a Java implementation to output a NEL when it is asked to print '\n'.
Anybody using a PrintStream to do serious work deserves the bugs they
get. They've got problems before they even start thinking about XML.
In fact, I wrote one 600 page book inspired mostly by exactly the
problems with PrintStream. Good code uses the other stream and writer
classes, in which this behavior is unambiguously specified.
>But to meet your larger point, there is nothing inappropriate in the use
>of 8-bit functions in XML processing. XML parsers that return UTF-8 are
>not unknown, and every XML file I generate for publication (~200 a day)
>is generated with 8-bit operations, and is either in UTF-8 or in 8859-1
>(properly labeled).
>
Do you really mean to suggest that using UTF-8 code points as C chars
is adequate? I suppose you could do that, but it most certainly is
not convenient and completely fails your stated goal of making XML
files plain text files. You're basically suggesting we treat them as
binary data rather than text.
>> All of the other functions we're talking about are similar. Even with
>> NEL, you still shouldn't be using these to process XML. OS/390 needs
>> to get some modern libraries. XML does not need to change.
>
>The issue remains: XML files on the mainframe are not plaintext files
>according to local conventions.
Yes, that's true and the issue is *much* broader than merely adding
NEL to the white space production. Even if we do this, XML files on
mainframes will still not be plain text files. Adding NEL won't fix
the problem.
This whole notion of the "plain text" file may be a red herring. The
community has realized over the last several years, that calling XML
files plain text, really isn't accurate on any platform. Hence the
move from text/xml to application/xml.
>XML processing is specified to be done in terms of LF only, with all
>other line-terminator conventions translated to LF. Suppose this
>had not been done, and all XML storage representations had been
>defined to require LF only. "What about Windows?" "Oh well, they
>can run an external program to convert CR/LF to LF before parsing,
>and LF to CR/LF after generation." If that had been the story, there
>damned well would be no significant amount of XML on Windows.
>You can rearrange this story using any line terminator and OS you like.
You're confusing issues by merging together two different time
frames: before and after XML 1.0 was released. Had IBM raised this
issue during the development of XML, it could have been considered on
different grounds. They failed to do so, and I see no justification
for reopening the case now. It is far more important for XML to
remain stable, than to allow a miniscule number of users (possibly as
few as zero) not to upgrade their software to something that supports
XML 1.0 conventions.
I find it completely reasonable to ask editors and other tools to
support the line ending conventions of the files they're editing. I
do this routinely on Mac, Windows, and Unix. I find it hard to
believe that it is so much more difficult for mainframe programmers
to do this.
>Mainframes and EBCDIC are far from dead. XML 1.0 Appendix F makes a
>point of talking about how to autodetect EBCDIC encodings, for example;
>there is no reason why XML files can't start 4C 6F A7 94.
>There is no reason not to convert the occasional 0x15 (or 0x85 in
>the ASCII-compatible encoding) to an XML end of line, either.
Airline reservation clerks and bank tellers don't count. They never
see the XML. How many actual users are their writing raw XML who have
problems? So far I haven't seen any. A programmer generating XML from
code can easily specify the line ending that XML requires. A
programmer reading XML through a parser will just see line feeds
anyway. You're trying to fix a non-existent problem.
>Speaking for myself and not necessarily the Core WG, I agree that there
>is no need to redefine the S production, merely to do line-terminator
>mapping on input. IMHO, there is no reason for #xD to be part of S
>either, as all real CRs are already mapped away, and having #xD be
>part of S serves only to allow very strange abuse of character
>references in entities containing attribute values and the like.
>However, I am certainly not suggesting that #xD be removed from S.
>
Again, it's a time frame issue. We are not discussing what XML would
be in an ideal world, had we known everything in 1996 that we know
now. We are discussing what is best to do now. Failing to add NEL, in
no way justifies removing CR.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| XML in a Nutshell, 2nd Edition (O'Reilly, 2002) |
| http://www.cafeconleche.org/books/xian2/ |
| http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
|