Lists Home |
Date Index |
On Wednesday 26 February 2003 14:19, David Megginson wrote:
> Alaric B. Snell writes:
> > If your system is sitting idling waiting for data over the network, then
> > a more compact representation would be a winner!
> We'll look forward to your test results.
If your system spends lots of its time waiting for networking, do you
disagree that reducing the bandwidth utilisation would reduce the service
round trip time and increase the maximum throughput?
Note that, of course, smaller packets won't reduce the latency of the link
due to speed of light limitations, but they will reduce the latency caused by
bandwidth limitations. And the maximum number of transfers your network can
handle in a second is directly related to the message size; if you can halve
the size of the message, you can fit twice as many through your pipe in unit
So to go back to emperical test results...
The ASN.1/XML interop people found that, for data-oriented XML, savings of
80% are common; eg, messages being one fifth the size. Per-packet overheads
aside, that would imply that you can fit about five times as many ASN.1/PER
encoded messages down a given network connection in a second as you can XML
Let's take an example of a stereotypical poster-child Web service... some
kind of online store.
It has a message you can send it to request a stock search, given keywords
and a price range, returning a list of stock descriptions.
And it has a message you can send it to place an order, containing delivery
addresses and one or more invoice lines, returning a basic success / failure
The latter operation will happen less often than the former, and will
probably involve more time-consuming operations such as checking availability
of all the items, checking the credit account, filing the order in the
database, and making a printer in a warehouse start printing out a packing
slip / manifest for dispatching to commense, so let's focus on the former.
The request message only needs to contain a keyword string and two prices; in
XML that might be:
<prices min="5.99" max="20.00" />
Total size = 76 bytes plus the highly variable length of the keyword string.
In PER, that would probably be a byte or two for the length of the keywords
(going up to two bytes, from memory, if it's more than 128 characters due to
variable length integer storage? Something like that), then the currency
values would actually be stored as numbers of pence in the same format -
probably two or three bytes each.
Total size = 6 bytes plus the highly variable length of the keyword string.
But the response would look like this in XML:
<result sku="GH234" price="6.50">Dark Side of the Moon</result>
<result sku="KK234" price="7.50">Wish You Were Here</result>
Size: 37 bytes + 43 bytes per result + description text length
I think in PER that would be another variable-length integer for the number
of results returned (called it one byte if we want less than a hundred
results), then (for each fixed-length SKU) five bytes plus two bytes of
price, one or two bytes of description length, then the description.
Size: 1 byte + 9 bytes per result + description text length.
In the PER cases, the resulting encoding will be almost entirely the
description texts, while in my XML exmaple the description text was smaller
than the XML surrounding it. If we say the descriptions are likely to be 20
bytes long, then we have a loss of 36 bytes of overhead (probably negiligible
in the long run) but a reduction in mean per-result size from 60 bytes each
to 30 bytes each, a halving. So we could be servicing twice as many customers
at once from a given Internet link, until the database can't handle all the
keyword searches any more.
Looking at it another way, consider Google's XML interface. If they got a
similar 50% reduction in size from using PER (considering that most of the
search results consist of URLs and descriptions as opposed to the structure
of the listing, ranking scores, etc) then, if their XML interface became
predominantly used, they could halve their bandwidth costs. I'm sure their
search algorithm is more resource-intensive than parsing and producing XML,
but their bandwidth usage must be *astronomical*!
A city is like a large, complex, rabbit