OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] An unclear point with W3C XML 1.0 Specification

[ Lists Home | Date Index | Thread Index ]

The file would be even smaller if written with attributes, that is, in XML.

Bob Foster

david.lyon@computergrid.net wrote:
 > Hi Ron,
 >
 > Yes, it may appear a little strange at first. But there
 > is logic behind it all.
 >
 > What you have shown are two good examples of traditional
 > XML 1.0. No problems with those at all. Bog standard.
 >
 > The answer to the question is that when multiplied out
 > a few tens of thousands of times, every extra character
 > and every parsing ambiguouity can slow the process of digesting
 > the data down, dramatically.
 >
 > There isn't much difference between:
 >
 >     <product_item>
 >       <part_description>3 1/2" Spanner</part_description>
 >     </product_item>
 >
 > and:
 >
 >     <product_item>
 >       part_description&="3 1/2^" Spanner"
 >     </product_item>
 >
 > Counted, this gives 51 characters on the first part description
 > style, and 35 on the second according to my counting style (which
 > can make mistakes).
 >
 > Add in the two tags to be fair, this gives (51+29)=80 characters, and
 > (35+29)=64 characters per inventory item.
 >
 > But with 40,000 line items, an easily achievable inventory pricelist,
 > we get a saving of 760,000 bytes, or about 80% of the size of the
 > xml 1.0.
 >
 > But make it more realistic, add in some more fields:
 >
 >     <product_item>
 >       <part_description>3 1/2" Spanner</part_description>
 >       <sell_price>18</sell_price>
 >       <cost_price>15.50</cost_price>
 >       <InStock>False</InStock>
 >       <Arrival_Date>2005-02-15</Arrival_Date>
 >     </product_item>
 >
 > and:
 >
 >     <product_item>
 >       part_description&="3 1/2^" Spanner"
 >       sell_price$=18
 >       cost_price$=15.50
 >       InStock?=False
 >       Arrival_Date@=2005-02-15
 >     </product_item>
 >
 > Style 1, 51+27+30+24+39+(29)=200
 > Style 2, 35+14+17+14+24+(29)=133
 >
 > Once again, multiply by 40,000 items and the respective
 > file sizes are approximately:
 >
 > Style 2 = 5320000 bytes
 > Style 1 = 8000000 bytes
 >
 > In XML 1.0, we have a larger file with no type information
 > and in the computergrid format, we have type information as
 > well as a saving of around 35% or 2,680,000 bytes.
 >
 > I guess they are the benefits of being strange as best as I
 > can explain it. I will stick to what I have got but thanks
 > for your comments anyway.
 >
 > Best Regards
 >
 > David






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS