OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

Jonathan Borden wrote:

>>> Right, so you could care less whether a number is encoded  either:
>>> 1) big endian floating point
>>> 2) little endian double
>>> 3) big endian 64 bit integer
>>> XML could really care less about these binary details. XML could just 
>>> as easily deal with 203bit integers as 23 bit integers. These binary 
>>> details just don't matter.
>>
>>
>> Ah! You're saying that there are several representations of a number 
>> in binary but only one in text, right? Wrong. Some textual number 
>> parsers will accept 2.3E5 as a number, some won't. There are different 
>> encodings for numbers like infinity.
>>
>> And consider how one encodes dates in XML. And how do you encode a 
>> person's address book entry? With <person><name>... 
>> <email>...</person> or with <addressBookEntry name="..." email="..." />?
> 
> 
> Now you are dealing with so-called XML _datatypes_, which only exist in 
> terms of applications layered on XML such as XML Schema or RDF. What I 
> am saying is that _for XML_ any so-called datatype is just another piece 
> of XML.
> 

Right, but you were talking about datatypes like integers in the context 
of binary stuff. In terms of binary encodings, the bytes that represent 
an integer are just bytes unless you go to the bother of extracting 
their meaning, too.

You can look at something like BER or MIME or IFF and just extract the 
tree structure, and just treat the 'values' as byte strings - just as 
well as you can look at an XML document as a tree structure of character 
strings.

But most data-processing applications, in both cases, will look inside 
at least some of the leaf nodes and convert strings of bytes, or 
characters, to some form of 'value'.

>> Sorry, the XML spec mandates that parsers be able to read UTF-16 
>> encoded XML, which means you *do* need to be concerned with trivial 
>> concerns about byte order.
> 
> well yes ... sigh ... I have somewhat tracked these character issues 
> from time to time, but frankly leave these issues to the XML cognoscenti 
> as well as my parser.
> 
> To a  very large extent *I* don't have to be concerned with these issues.

And somebody using a toolkit to handle binary encodings for them doesn't 
need to worry about endianness or field widths, either.

What I'm trying to say is that you appeared to be comparing apples with 
oranges - saying that people working with binary formats need to 'worry' 
about endianness and stuff while people working with XML don't.

Thing is, you only have to deal with endianness and so on if you're 
actually working directly with the bits on the wire, which is generally 
only if you're working without benefit of a toolkit to do the magic for 
you, or if you're writing a toolkit - and it's the same with XML!

>> And if you want to use an off-the-shelf XML parser to prevent you from 
>> having to worry about those details - then use an off-the-shelf BER 
>> parser so you don't need to care about endianness in binary, either.
> 
> 
> Off the shelf BER parser ... where do I get one of those ... do I have 
> to install it on my machine? Will the person receiving the message 
> understand BER?

Yep... same as with an XML parser, which also has to be installed on 
your machine, and the recipient needs to understand XML to make sense of 
it, too.

 > Frankly I am sure that ASN.1++ could have solved all the
> technical issues ... this discussion seems oddly analogous to TCP/IP vs. 
> OSI as a network protocol. The real issue is *mindshare* for which ASN.1 
> doesn't compare with XML ... I assume you et al. are trying to correct 
> that.

In the Web industry, at least, XML has more mindshare, although XML has 
been leaking into industries where ASN.1 is currently holding the 
castle, which is probably one of the reasons behind all the ASN.1/XML 
work at the ITU-T. I don't see ASN.1 overtaking XML on the web per se, 
but by 'web' I mean 'people viewing Web pages', where XML has had little 
luck in displacing HTML anyway...

> If you are trying to convince me (and I *am* someone who might be 
> convinced) you are going to need to:
> 
> a) make ASN.1 as easy as XML to work with

The main obstacle there right now is free tool availability, but there's 
a rising interest in open source development here, so Watch This Space.

> b) make it as easy for the mythical "grad student" to write an ASN.1 
> parser as it is to write an XML parser

IMHO it's easier to write a BER parser than an XML parser - but parsing 
ASN.1 itself isn't the equivelant function to an XML parser, ASN.1 is 
the schema language. Work is afoot to define that as an abstract type, 
actually, meaning that you could then use a normal BER (or XER!) parser 
to read it.

> c) talk to me in my language

This is an interesting point - the ASN.1 and XML camps started off with 
different mindsets, and this hasn't helped communication much...

>> Well, no, most people develop binary formats because they're simpler 
>> than XML, and they'd rather be getting on with writing their 
>> application than bothering with DTDs and SAX and DOM and stuff, in my 
>> experience. The only reason to be convered with XML goo is if you have 
>> an overriding concern regarding 'being able to view and edit the files 
>> in a text editor'!
> 
> Well fine, I expect that the market for binary applications will 
> continue to exist -- how does this so-called "binary XML" help with any 
> of this?

The real drive behind binary XML, as I see it, is because some have 
tried to push XML into areas it's not best for. It appears to have been 
designed as a generalised HTML, for browser/display applications, but 
people are starting to want to transfer data with it between 
applications - an area where binary encodings tend to be easier to work 
with (XML may be simple to EMIT, but it's complex to PARSE and 
UNDERSTAND - fine when you have a world full of server-side scripts 
emitting it to a small number of browser implementations...). So people 
are seeking to encode the XML 'model' into binary formats in order to 
adapt XML to fit the demands being placed upon it.

>>
>> Nothing - because XSLT is actually hiding the implementation of that 
>> XML data from you; you just access the tree with it. The XSLT engine 
>> you use could quite easily operate on a binary encoding syntax or 
>> something and you wouldn't need to rewrite the XSLT. This is what Bob 
>> is saying is a Good Thing. Don't you agree with him? :-)
>>
> Theoretically, sure... I've made that argument at least since 1998 when 
> I wrote an "XML parser" for DICOM ... I think I lost it (really!) 
> because although it makes a great theoretical point, it isn't practical 

It's been working fine for TCP/IP; IP datagrams are a fairly abstract 
concept, along the path they will be mapped into various different 
formats. Your computer will receive those IP headers differently over 
Ethernet than over PPP, for example, but that's hidden behind an 
abstraction layer; the TCP layer just sees IP datagrams, regardless of 
the implementation of those datagrams as bits on the wire.

Likewise with images on the Web; you're free to use GIFs or JPEGs, and 
PNGs are now widely enough supported to be considered ubiquitous. You 
choose your favourite encoding (different ones have different 
capabilities), but the user just sees an image.

Similarly with text encodings; a few are ubiquitous (US-ASCII, 
ISO-8859-1, increasingly UTF-8 and friends), but there are others that 
get used in specialist applications (EBCDIC, KOI-8, Shift-JIS, etc).

> ... you generally need to rewrite your XSLT for any significant change 
> in document format.

Structure, yes; format, no. The same XSLT can work just as well on a SAX 
event source that's actually a database or a computational process that 
generates XML on the fly as from the result of parsing a saved dump of 
such activity as an XML file. And I have developed XSLT systems where 
the only XML is that used to write the XSLT stylesheet in; it takes a 
DOM tree generated from a database query and transforms it straight into 
HTML...

> 
> Jonathan
> 

ABS





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS