Lists Home |
Date Index |
Jonathan Borden wrote:
>>> Right, so you could care less whether a number is encoded either:
>>> 1) big endian floating point
>>> 2) little endian double
>>> 3) big endian 64 bit integer
>>> XML could really care less about these binary details. XML could just
>>> as easily deal with 203bit integers as 23 bit integers. These binary
>>> details just don't matter.
>> Ah! You're saying that there are several representations of a number
>> in binary but only one in text, right? Wrong. Some textual number
>> parsers will accept 2.3E5 as a number, some won't. There are different
>> encodings for numbers like infinity.
>> And consider how one encodes dates in XML. And how do you encode a
>> person's address book entry? With <person><name>...
>> <email>...</person> or with <addressBookEntry name="..." email="..." />?
> Now you are dealing with so-called XML _datatypes_, which only exist in
> terms of applications layered on XML such as XML Schema or RDF. What I
> am saying is that _for XML_ any so-called datatype is just another piece
> of XML.
Right, but you were talking about datatypes like integers in the context
of binary stuff. In terms of binary encodings, the bytes that represent
an integer are just bytes unless you go to the bother of extracting
their meaning, too.
You can look at something like BER or MIME or IFF and just extract the
tree structure, and just treat the 'values' as byte strings - just as
well as you can look at an XML document as a tree structure of character
But most data-processing applications, in both cases, will look inside
at least some of the leaf nodes and convert strings of bytes, or
characters, to some form of 'value'.
>> Sorry, the XML spec mandates that parsers be able to read UTF-16
>> encoded XML, which means you *do* need to be concerned with trivial
>> concerns about byte order.
> well yes ... sigh ... I have somewhat tracked these character issues
> from time to time, but frankly leave these issues to the XML cognoscenti
> as well as my parser.
> To a very large extent *I* don't have to be concerned with these issues.
And somebody using a toolkit to handle binary encodings for them doesn't
need to worry about endianness or field widths, either.
What I'm trying to say is that you appeared to be comparing apples with
oranges - saying that people working with binary formats need to 'worry'
about endianness and stuff while people working with XML don't.
Thing is, you only have to deal with endianness and so on if you're
actually working directly with the bits on the wire, which is generally
only if you're working without benefit of a toolkit to do the magic for
you, or if you're writing a toolkit - and it's the same with XML!
>> And if you want to use an off-the-shelf XML parser to prevent you from
>> having to worry about those details - then use an off-the-shelf BER
>> parser so you don't need to care about endianness in binary, either.
> Off the shelf BER parser ... where do I get one of those ... do I have
> to install it on my machine? Will the person receiving the message
> understand BER?
Yep... same as with an XML parser, which also has to be installed on
your machine, and the recipient needs to understand XML to make sense of
> Frankly I am sure that ASN.1++ could have solved all the
> technical issues ... this discussion seems oddly analogous to TCP/IP vs.
> OSI as a network protocol. The real issue is *mindshare* for which ASN.1
> doesn't compare with XML ... I assume you et al. are trying to correct
In the Web industry, at least, XML has more mindshare, although XML has
been leaking into industries where ASN.1 is currently holding the
castle, which is probably one of the reasons behind all the ASN.1/XML
work at the ITU-T. I don't see ASN.1 overtaking XML on the web per se,
but by 'web' I mean 'people viewing Web pages', where XML has had little
luck in displacing HTML anyway...
> If you are trying to convince me (and I *am* someone who might be
> convinced) you are going to need to:
> a) make ASN.1 as easy as XML to work with
The main obstacle there right now is free tool availability, but there's
a rising interest in open source development here, so Watch This Space.
> b) make it as easy for the mythical "grad student" to write an ASN.1
> parser as it is to write an XML parser
IMHO it's easier to write a BER parser than an XML parser - but parsing
ASN.1 itself isn't the equivelant function to an XML parser, ASN.1 is
the schema language. Work is afoot to define that as an abstract type,
actually, meaning that you could then use a normal BER (or XER!) parser
to read it.
> c) talk to me in my language
This is an interesting point - the ASN.1 and XML camps started off with
different mindsets, and this hasn't helped communication much...
>> Well, no, most people develop binary formats because they're simpler
>> than XML, and they'd rather be getting on with writing their
>> application than bothering with DTDs and SAX and DOM and stuff, in my
>> experience. The only reason to be convered with XML goo is if you have
>> an overriding concern regarding 'being able to view and edit the files
>> in a text editor'!
> Well fine, I expect that the market for binary applications will
> continue to exist -- how does this so-called "binary XML" help with any
> of this?
The real drive behind binary XML, as I see it, is because some have
tried to push XML into areas it's not best for. It appears to have been
designed as a generalised HTML, for browser/display applications, but
people are starting to want to transfer data with it between
applications - an area where binary encodings tend to be easier to work
with (XML may be simple to EMIT, but it's complex to PARSE and
UNDERSTAND - fine when you have a world full of server-side scripts
emitting it to a small number of browser implementations...). So people
are seeking to encode the XML 'model' into binary formats in order to
adapt XML to fit the demands being placed upon it.
>> Nothing - because XSLT is actually hiding the implementation of that
>> XML data from you; you just access the tree with it. The XSLT engine
>> you use could quite easily operate on a binary encoding syntax or
>> something and you wouldn't need to rewrite the XSLT. This is what Bob
>> is saying is a Good Thing. Don't you agree with him? :-)
> Theoretically, sure... I've made that argument at least since 1998 when
> I wrote an "XML parser" for DICOM ... I think I lost it (really!)
> because although it makes a great theoretical point, it isn't practical
It's been working fine for TCP/IP; IP datagrams are a fairly abstract
concept, along the path they will be mapped into various different
formats. Your computer will receive those IP headers differently over
Ethernet than over PPP, for example, but that's hidden behind an
abstraction layer; the TCP layer just sees IP datagrams, regardless of
the implementation of those datagrams as bits on the wire.
Likewise with images on the Web; you're free to use GIFs or JPEGs, and
PNGs are now widely enough supported to be considered ubiquitous. You
choose your favourite encoding (different ones have different
capabilities), but the user just sees an image.
Similarly with text encodings; a few are ubiquitous (US-ASCII,
ISO-8859-1, increasingly UTF-8 and friends), but there are others that
get used in specialist applications (EBCDIC, KOI-8, Shift-JIS, etc).
> ... you generally need to rewrite your XSLT for any significant change
> in document format.
Structure, yes; format, no. The same XSLT can work just as well on a SAX
event source that's actually a database or a computational process that
generates XML on the fly as from the result of parsing a saved dump of
such activity as an XML file. And I have developed XSLT systems where
the only XML is that used to write the XSLT stylesheet in; it takes a
DOM tree generated from a database query and transforms it straight into