[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Binary XML - summary of discussion to date

From: Al Snell <alaric@alaric-snell.com>
To: Amy Lewis <amyzing@talsever.com>
Date: Sat, 14 Apr 2001 23:17:52 +0100 (BST)

On Sat, 14 Apr 2001, Amy Lewis wrote:

> >XDR runtime libraries and development kits are probably more widespread
> >than XML runtimes, since an XDR toolkit of some kind comes with every OS
> >that can do NFS (certainly every Unix I've encountered), but because
> 
> Not a terribly wide base of examples, actually.  NFS is very uncommon,
> and often implemented in a high-cost package, outside of the various
> unixen and unix-alike OSen.

One thing ONC-RPC could do with is a widely available open source
implementation. All the open source OSs seem to maintain their own ONC-RPC
runtime for NIS and NFS access - it should be a piece of cake to choose
the best one and pull it out into an autoconfing package... this would
make it easy to get common support for the RPC crypto options. But I
digress :-)

The only non-open-source Unix I've played with is Solaris, but between
Solaris and Linux and the three open-source BSDs, which all do ONC-RPC and
hence XDR, there's not a lot of market share left...

> "Binary" protocols, btw, do not exist, as such. 

Yeah, I think the classification of things into "text" and "binary" is a
bit lame in many respects, but it's become common (if ambiguous and
subjective) usage.

> headers that are record oriented.  It isn't that it's faster to read a
> network-byte order sixteen-bit integer on *every* possible
> platform--it's that IP always puts the length field in *exactly* the
> same offset from the start of the frame, so even if it's encoded as
> BCD, or as a four-character text field, I can get it just as fast.
> 
> If it's XDR, I don't know where it is, and in order to find it, I have
> to parse every byte that comes in front of it. 

(surprise)

Whoa, only if there's a variable-length-encoded field BEFORE the one
you're after!

http://RFC.net/rfc1832.html

> The generalization
> about the allegedly self-describing binary formats is that they're a
> pain to decode in a hurry.

XDR sure isn't self describing... it's a pretty ornery old-school binary
marshalling system that relies on an agreed schema at each end to find out
what's what in the record.

> SMTP, and IMAP").  First time I saw BXXP, I just laughed out loud.

BXXP as in BEEP as in http://RFC.net/rfc3080.html?

Hmmm... interesting. Haven't studied it in detail, but it looks better
than doing it over HTTP, at a glance!

> In this context, the "self-describing binary" formats *must* *allow*
> every possible value to be representable. 

Within the constraints of some schema somewhere, yes.

> For XDR and BER, that meant
> that you started munching at the front, and branched when the
> self-describing data described a branch.  No offsets, no cheap
> searches.  No two-byte, four-byte, whatever integers, except more or
> less by accident, and because everything could (in theory) keep growing
> and growing and growing and growing, then you had to parse everything
> until you found the particular bit you needed (even if you know that
> there are four integers in front of the particular integer you want,
> you have to parse all four, because they're variable length, and you
> *could* get the wrong one by making ass u m ptions). 

The xml-bin group is looking at indexing strategies :-)

> If you can use
> records, like IP or TCP etc., use them.  Offsets are easy.  Otherwise,
> if it's going to be variable length, it's going to hurt, probably.  But
> if there are delimiters to look for (newline, typically but not always
> as defined for the NVT--pointy brackets for XML), that makes the
> variable length easier

Ahem. If you have a variable-length field with a length integer at the
front, you can seek right over potentially massive runs without having to
examine them byte by byte for delimeters!

> ('cause you can ignore anything that isn't a
> delimiter,

You have to check every character for delimeterness, though :-(

> In short, I'd be very surprised if a variable-length binary encoding
> is any faster to parse (if all you want is a subset of information)
> than XML; it's certainly very likely to be harder to read.

Even without indexing, the use of length words (aka pointers to the next
record in a breadth-first tree traversal) allows the bulk skipping of a
lot of unnecessary information, without needing a byte by byte
examination... and that's the *simplest* indexing technique available in
binary formats.

> Writing decodes for SNMP (ASN.1/BER) and the RPC based suites (XDR,
> that is) was hell.  It isn't that no one knows about them.  They aren't
> as suitable to the general purpose of XML as XML is.  Even XML-for-data
> has the advantage, when discussion goes from machine to machine, that
> there is no real discussion of byte order

Yes there is. XML stores data big-endian. Just because that's the human
convention for integers doesn't mean it isn't significant. In the spec
(the schema spec I guess) it will say something about base ten
representation of the integers from left to right using UTF-8 numerals,
just like the XDR spec specifies a certain byte order for integers and
twos-complement encoding of sign.

> Amy! (who has now not smoked for eighteen hours, and seems to be
> heading rapidly in the direction of verbosity).

Smoking is bad for you :-)

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software

References:
- Re: Binary XML - summary of discussion to date
  - From: Amy Lewis <amyzing@talsever.com>

Prev by Date: Re: Binary XML - summary of discussion to date
Next by Date: Re: Linkbases, Topic Maps, and RDF Knowledge Bases -- help me
Previous by thread: Re: Binary XML - summary of discussion to date
Next by thread: Re: Binary XML - summary of discussion to date
Index(es):
- Date
- Thread