OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry

Rick Jelliffe wrote:

> This only requires that the IBM line-end character should be allowed as part
> of the document character set.  I think this should be uncontraversial, and
> only requires a 3rd edition of XML, as a correction.

All Unicode 3.1 code points, including the unassigned ones, are already
part of the XML document character set.  (The trivial exceptions are
most of the C0 control characters, the surrogate space, and U+FFFE/FF.)
The issue here is the implicit NAMECHAR and NMSTCHAR declarations,
if I remember my SGML 8-letterisms correctly.

> The second case is where we want to edit XML on an IBM system which, out of
> the control of the user, inserts IBM line-end characters when the user is
> typing in their markup.   To me this second case is no different to the case
> of East Asians typing with editors that stick in ideographic spaces rather
> than ASCII spaces: tough luck, you need to run the data through a converter.

It's not about typing, but about the representation of plain text on the
platform.  <flame>If XML had insisted that the
One True Representation of line-end is LF, and XML processors were
passing through every CR in character content and coughing on every CR
in markup, don't you think the situation would have been changed P.D.Q.?
Justice delayed is justice denied, but better than justice denied

>  1) upgrade the document character set to Unicode 3.1 as a 3rd edition

No need.

>  2) state that "XML processors may, at user option, if they detect the
>     IBM newline or  any other visual white-space in markup, element content
>     or in an entity/XML declaration, replace the characters with LF, as a
>     matter of entity management."

That is what Blueberry does, except that the "user option" is expressed
in the document, not by some out-of-band means.  This is plausible,
since it is the document creator who knows whether NEL, or post-2.0 name
characters, or both, are being used.

> This keeps the status of those characters w.r.t. XML 1.0 clear, in
> particular
> the fact that they will cause interoperability problems when used with
> other XML documents, but it provides a workaround for inhouse use.

I suppose you mean "other XML tools".

That is what Blueberry is all about: it blesses documents using certain
non-1.0 features, and requires that they be marked as such.

There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein