OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry

From: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>

> And XML handles these perfectly. Indeed when you're writing or
> reading XML you simply don't care which line ending convention was
> used, which is the way it should be.

I'm with Rusty.

Lets not get confused!  (That should be "Note to self: try not to get
confused!") There are two separate cases here.

The first case is where XML is generated by a program, running on an IBM
system with this convention.  In that case, there is no need to extend the
characters which the XML parser recognises as whitespace, because the
characters sent are under programmer control.   And the parser does not
(should not) care about whether the IBM line-end character is sent as part
of data.
This only requires that the IBM line-end character should be allowed as part
of the document character set.  I think this should be uncontraversial, and
only requires a 3rd edition of XML, as a correction.

The second case is where we want to edit XML on an IBM system which, out of
the control of the user, inserts IBM line-end characters when the user is
typing in their markup.   To me this second case is no different to the case
of East Asians typing with editors that stick in ideographic spaces rather
than ASCII spaces: tough luck, you need to run the data through a converted.

So neither of these cases justify adding IBM new-lines to the whitespace
characters recognised by XML tokenizers.

So perhaps the following is a reasonable compromise:

 1) upgrade the document character set to Unicode 3.1 as a 3rd edition
 2) state that "XML processors may, at user option, if they detect the
    IBM newline or  any other visual white-space in markup, element content
    in an entity/XML declaration, replace the characters with LF, as a
matter of
    entity management."

This keeps the status of those characters w.r.t. XML 1.0 clear, in
the fact that they will cause interoperability problems when used with
other XML documents, but it provides a workaround for inhouse use.

Rick Jelliffe