OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Streaming XML and SAX

[ Lists Home | Date Index | Thread Index ]
  • From: Tom Harding <tomh@thinlink.com>
  • To: XML-Dev Mailing list <xml-dev@ic.ac.uk>
  • Date: Sat, 27 Feb 1999 19:28:42 -0800

David Megginson wrote:

> ...As you
> can see in the above excerpt, the character-set discover heuristics in
> XML are intended for use only in the absence of protocol-specific
> encoding information.

I suspect those lengthy notes were written to explain exactly how developers were to reconcile
the fact that an external way of declaring the encoding already existed in HTTP, which it
would have been rather unkind to ignore.  Tim Bray's annotations to the spec seem to confirm

But since we're designing a protocol independent of HTTP, we ought to let the XML encoding
declaration do its job.

> For example, imagine that I have a Java class
> like this:
>   public class Purchase {
>     public int seqno;
>     public int customerId;
>     public int vendorId;
>     public int invoiceId;
>     public float total;
>   }
> In XML, an instance of this information might look like this:
>   <purchase xmlns="http://www.ecommerce.net/ns/ec/">
>    <seqno>12345678</seqno>
>    <customer-id>87654321</customer-id>
>    <vendor-id>18273645</vendor-id>
>    <invoice-id>81726354</invoice-id>
>    <total>92674.12</total>
>   </purchase>
> Based on my (limited) understanding of the Java VM, the Java versions
> of a Purchase objects will require 24 bytes of storage each; I'd guess
> that even a heavily-optimised generic DOM implementation would require
> at least 5-10 times as much storage (I'll welcome corrections from any
> DOM implementors on this list).
> In other words, if I go straight from the XML to my own object model,
> I can store 100,000 purchases in 2,400,000 bytes of storage; if I go
> from XML to a generic DOM object model, I will require between
> 12,000,000 and 24,000,000 (or more) bytes to store the same
> information, and then I will *still* have to build my own object model
> afterwards.

Multiplying your numbers by 100,000 is a little gratuitous, since it would be lousy
application design to force all 100,000 objects to be stored in DOM format at the same time
(say, by cramming them all into some super-document).  I will be the first to admit that it
takes resources to parse XML out to a standard memory representation, but I see no reason why
those resources shouldn't be in line with the work accomplished, which is mostly converting
markup to memory structures.  And actually, you should be comparing it with the storage
required by unparsed XML, not your application object.  That's how you would need to store it
if you chopped up the stream into chunks to be passed off to separate threads or boxes as you

Tom Harding

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS