OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Why would MS want to make XML break on UNIX, Perl, Python

[ Lists Home | Date Index | Thread Index ]

From: "Michael Rys" <mrys@microsoft.com>

From: Rick Jelliffe [mailto:ricko@allette.com.au]

>> Sure, lets make XML unsuitable for use in UNIX pipes by allowing ^D.
>> And for Perl and Python text-processing programs that use standard in and
>> expect EOF (^D or ^Z).

> I was a Unix hack for at least 11 years of my life (before joining the
> evil empire and after leaving the mainframe and early PCs and Macs :-)),
> and I can assure you that unix pipes do not care about control
> characters.

You are right. I should have been talking about file IO using text mode
on DOS/Windows systems.My experience with pipes failing due to binary data
may be because of opening the file (in CYG-WIN) in text mode at the
opening stage, not subsequent ones, and due to spooler behaviour
UNIX (printers requiring ^D or ^Z).  Perhaps I am showing my age:
perhaps serial comms and text mode are used so rarely or are
well-protected behind layers that it does not matter if control characters
are embedded. (Doesn't seem likely...even then, the problem of editability
is strong enough to discount embedded controls.)  

The APIs I mentioned (CGI, Haskell, Perl, Python, C++) all deal
with opening files not reading stdin. E.g. 

> I don't want to know what kind of innuendo you want to imply on my
> motives with your reply (I thanks Dare for his mail, although I am not
> feeling as attacked as he interpreted the mail). But supporting the
> standard also means that we need to work on evolving the standard if we
> see issues that are not currently addressed in some of the main
> scenarios of current XML usage.

No innuendo was intended on your motives.  But MS employers can hardly
be surprised if the general public treats their comments with a certain
wariness, following the court case.  It is not a matter if "evil" or "motives",
it is power, groupthink, organizational behaviour and prudence.

> Embrace and extend would be to simply generate and consume arbitrary
> code points in XML 1.0 without proper warnings and errors.

No. Embrace and extend would be to include anything that would not
fit in to competitor's architectures or established APIs.

> Over the last three years, the usage of XML has evolved into areas where
> some people claim it is not as well suited. I honor that opinion,
> although I disagree. By evolving from 1.0 to 1.1, we will be adopting
> XML for these areas in an interoperable way. If you do not want to write
> or parse 1.1, stick with 1.0. 

I am glad you disagree. 

> Yes, there are some technical problems with C based APIs such as libxml.
> Maybe libxml needs to evolve as well (to libxml2?) for 1.1 applications
> and libxml for 1.0 will not be able to support some 1.1 documents and
> raise an error if NULL comes through.

How do we handle NULLs using C strings again? I missed that part.

> Or we find an interoperable way to transport/encode the control
> characters (agree on entities or char references or PIs). 

Yes. I believe the control character problem is a subset of a wider issue:
how to define "private use" code points (I believe that the control
characters are sui generis with the PUA characters) for use by
contracted groups.  I have made several proposals for these over
the years, but the I18n group at W3C has always taken the line
that only standard characters are suited for public interchange.

I agree that it would be useful to be able send arbitrary strings,
and using Bin64 is clunky and does not go far enough, because
it does not establish the semantics of the characters enough
(this has always been the problem with the C0 characters anyway,
as anyone in serial comms can attest, they are a grab bag.) 

> However, sticking the head in the sand and ruling the problems out of
> bound...

I think I am happy to have the problems thought about, and indeed I have
been discussing them and pushing for a solution for this kind of
thing (semi-private characters) for many years.  It is the particular 
solution proposed that is bad IMHO. 

>  -XML will become as obscure as SGML itself was and ASN.1 takes over

Having a non-text XML will fracture it just as fast as anything else. Indeed,
the move to names > 16 bits (though good) will cause enough strain.

> -XML will fracture and the fracture lines will not be along an
>interoperable line but IBM will support NEL anyway, database vendors
>will map their textual types into XML text without having an
>interoperable way (or it will be confined to their industry group such
>as the ISO SQL standard) etc..

I am not at all convinced that the tradeoffs that are appropriate for web 
distribution of publications and reports (i.e. what is in XML) must be appropriate 
for database serialization.  Of course there is scope for change that improves
both (e.g. I have suggested building in knowledge of standard ISO entities
into parsers so that we can get rid of DTDs, and moving naming rules
to being a layer above WF to allow faster exchange of data known to be

But what to do if the needs of serialization conflicts with the current 
publishing/programming characteristics of XML (text allowing almost any encoding 
and catching many errors, with readable names {not to be confused with 
comprehensible names})?   "Jettisoning SGML" seems to be code for
adding incompatabilities that make XML non-text, unsafe (for encodings),
or non-readable.  If there is a conflict, XML should remain a markup
language, and the developers who need a serialization language should
work within XML-as-text or develop a better, special purpose binary
format (ASN.1 has been mentioned before.)

Rick Jelliffe
(Not writing on behalf of employer.)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS