OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Icebergs - XML file metrics



Stefan, Charles, Ronald,

Thanks to you all for your responses - I am looking through the 
material though a little surprised that no-one has turned up a set of 
larger test files. Perhaps that's a challenge for us to create one!

Thanks,
Robin

At 10:59 AM +0200 3/26/01, Stefan Zier wrote:
>http://www-106.ibm.com/developerworks/education/xmljava/xmljava-6-4.html
>
>I think this tool from IBM developerworks is a good basis to start from. It
>collects stats about Document Nodes, Element Nodes, Entity Reference Nodes,
>CDATA Sections, Text Nodes, Processing Instructions in a DOM tree.
>
>---------------------------------------
>Stefan Zier
>Software Developer
>Syntion AG - http://www.syntion.com
>Leonrodplatz 2 - 80636 Munich/Germany
>Phone +49 89 52 30 45-0
>Fax +49 89 52 30 45-20
>
>----- Original Message -----
>From: Robin LaFontaine <robin@monsell.co.uk>
>To: <xml-dev@lists.xml.org>
>Sent: Friday, March 23, 2001 6:41 PM
>Subject: Icebergs - XML file metrics
>
>
> > Can anyone help with this: Is there a way of 'profiling' an XML file
> > to indicate its characteristics?
> >
> > We test our XML comparators on large files, but a 5Mb XML file could
> > have twenty XML tags or 20,000 and it could be deeply nested or flat.
> > So, are there any metrics to help in this characterization?
> >
> > Seems sensible to use ratios as far as possible, so that they are
> > comparable for different file sizes, perhaps:
> >
> > 1. File size (not a ratio)
> >
> > 2. No. of elements / file size in kb = no. of elements/kb (or Mb perhaps?)
> >
> > 3. No. of attributes / no. of elements = no. of attributes/element
> >
> > 4. No. of text nodes / no. of elements = no. of text nodes/element
> >
> > 5. No. of text nodes / no. of unique text nodes = text re-use index
> >
> > 6. No. of attribute values / no. of unique attr. values = attribute
> > value re-use index
> >
> > 7. (sum for each element of no. of ancestors for the element) / no.
> > of elements = Average depth (iceberg factor).
> >
> > Last one indicates nesting depth, e.g.
> > <a> <b/><b/><b/><b/></a> = (0+1+1+1+1)/5 = 0.8
> >
> > <a> <b><b><b><b></b></b>/<b></b> </a> = (0+1+2+3+4)/5 = 10/5 = 2
> >
> > <a> <b><b><b><b> <b><b><b><b> </b></b>/<b></b> </b></b>/<b></b> </a>
> > = (0+1+2+3+4+5+6+7+8)/5 = 36/9 = 4
> >
> > Perhaps someone has already developed a different set of metrics.
> >
> > Robin
> > -- -----------------------------------------------------------------
> > Robin La Fontaine, Monsell EDM Ltd
> > (XML file comparison, Engineering data exchange and management using
> > XML, R&D Project Management)
> > Tel: +44 1684 592 144 Fax: +44 1684 594 504
> > Email: robin@monsell.co.uk      http://www.deltaxml.com
> >
> > ------------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org, an initiative of OASIS
> > <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To unsubscribe from this elist send a message with the single word
> > "unsubscribe" in the body to: xml-dev-request@lists.xml.org
> >
>
>
>------------------------------------------------------------------
>The xml-dev list is sponsored by XML.org, an initiative of OASIS
><http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To unsubscribe from this elist send a message with the single word
>"unsubscribe" in the body to: xml-dev-request@lists.xml.org

-- -----------------------------------------------------------------
Robin La Fontaine, Monsell EDM Ltd
(XML file comparison, Engineering data exchange and management using 
XML, R&D Project Management)
Tel: +44 1684 592 144 Fax: +44 1684 594 504
Email: robin@monsell.co.uk      http://www.deltaxml.com