OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

On Wednesday 26 February 2003 14:19, David Megginson wrote:
> Alaric B. Snell writes:
>  > If your system is sitting idling waiting for data over the network, then
>  > a more compact representation would be a winner!
>
> We'll look forward to your test results.

If your system spends lots of its time waiting for networking, do you 
disagree that reducing the bandwidth utilisation would reduce the service 
round trip time and increase the maximum throughput?

Note that, of course, smaller packets won't reduce the latency of the link 
due to speed of light limitations, but they will reduce the latency caused by 
bandwidth limitations. And the maximum number of transfers your network can 
handle in a second is directly related to the message size; if you can halve 
the size of the message, you can fit twice as many through your pipe in unit 
time.

So to go back to emperical test results...

The ASN.1/XML interop people found that, for data-oriented XML, savings of 
80% are common; eg, messages being one fifth the size. Per-packet overheads 
aside, that would imply that you can fit about five times as many ASN.1/PER 
encoded messages down a given network connection in a second as you can XML 
messages.

Let's take an example of a stereotypical poster-child Web service... some 
kind of online store.

It has a message you can send it to request a stock search, given keywords 
and a price range, returning a list of stock descriptions.

And it has a message you can send it to place an order, containing delivery 
addresses and one or more invoice lines, returning a basic success / failure 
code.

The latter operation will happen less often than the former, and will 
probably involve more time-consuming operations such as checking availability 
of all the items, checking the credit account, filing the order in the 
database, and making a printer in a warehouse start printing out a packing 
slip / manifest for dispatching to commense, so let's focus on the former.

The request message only needs to contain a keyword string and two prices; in 
XML that might be:

<search>
 <keywords>pink floyd</keywords>
 <prices min="5.99" max="20.00" />
</search>

Total size = 76 bytes plus the highly variable length of the keyword string.

In PER, that would probably be a byte or two for the length of the keywords 
(going up to two bytes, from memory, if it's more than 128 characters due to 
variable length integer storage? Something like that), then the currency 
values would actually be stored as numbers of pence in the same format - 
probably two or three bytes each.

Total size = 6 bytes plus the highly variable length of the keyword string.

But the response would look like this in XML:

<search-response>
<result sku="GH234" price="6.50">Dark Side of the Moon</result>
<result sku="KK234" price="7.50">Wish You Were Here</result>
</search-response>

Size: 37 bytes + 43 bytes per result + description text length

I think in PER that would be another variable-length integer for the number 
of results returned (called it one byte if we want less than a hundred 
results), then (for each fixed-length SKU) five bytes plus two bytes of 
price, one or two bytes of description length, then the description.

Size: 1 byte + 9 bytes per result + description text length.

In the PER cases, the resulting encoding will be almost entirely the 
description texts, while in my XML exmaple the description text was smaller 
than the XML surrounding it. If we say the descriptions are likely to be 20 
bytes long, then we have a loss of 36 bytes of overhead (probably negiligible 
in the long run) but a reduction in mean per-result size from 60 bytes each 
to 30 bytes each, a halving. So we could be servicing twice as many customers 
at once from a given Internet link, until the database can't handle all the 
keyword searches any more.

Looking at it another way, consider Google's XML interface. If they got a 
similar 50% reduction in size from using PER (considering that most of the 
search results consist of URLs and descriptions as opposed to the structure 
of the listing, ranking scores, etc) then, if their XML interface became 
predominantly used, they could halve their bandwidth costs. I'm sure their 
search algorithm is more resource-intensive than parsing and producing XML, 
but their bandwidth usage must be *astronomical*!
 
> David

ABS

-- 
A city is like a large, complex, rabbit
 - ARP




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS