[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Icebergs - XML file metrics
- From: Charles Reitzel <creitzel@mediaone.net>
- To: Robin LaFontaine <robin@monsell.co.uk>
- Date: Sat, 24 Mar 2001 13:00:30 -0500
I'd also include # of distinct element names and # distinct attribute names. Perhaps w/ # of occurrences for each.
At 04:41 PM 3/23/01 +0000, Robin LaFontaine wrote:
>Can anyone help with this: Is there a way of 'profiling' an XML file
>to indicate its characteristics?
>
>We test our XML comparators on large files, but a 5Mb XML file could
>have twenty XML tags or 20,000 and it could be deeply nested or flat.
>So, are there any metrics to help in this characterization?
>
>Seems sensible to use ratios as far as possible, so that they are
>comparable for different file sizes, perhaps:
>
>1. File size (not a ratio)
>
>2. No. of elements / file size in kb = no. of elements/kb (or Mb perhaps?)
>
>3. No. of attributes / no. of elements = no. of attributes/element
>
>4. No. of text nodes / no. of elements = no. of text nodes/element
>
>5. No. of text nodes / no. of unique text nodes = text re-use index
>
>6. No. of attribute values / no. of unique attr. values = attribute
>value re-use index
>
>7. (sum for each element of no. of ancestors for the element) / no.
>of elements = Average depth (iceberg factor).
>
>Last one indicates nesting depth, e.g.
><a> <b/><b/><b/><b/></a> = (0+1+1+1+1)/5 = 0.8
>
><a> <b><b><b><b></b></b>/<b></b> </a> = (0+1+2+3+4)/5 = 10/5 = 2
>
><a> <b><b><b><b> <b><b><b><b> </b></b>/<b></b> </b></b>/<b></b> </a>
>= (0+1+2+3+4+5+6+7+8)/5 = 36/9 = 4
>
>Perhaps someone has already developed a different set of metrics.
>
>Robin
>-- -----------------------------------------------------------------
>Robin La Fontaine, Monsell EDM Ltd
>(XML file comparison, Engineering data exchange and management using
>XML, R&D Project Management)
>Tel: +44 1684 592 144 Fax: +44 1684 594 504
>Email: robin@monsell.co.uk http://www.deltaxml.com
>
>------------------------------------------------------------------
>The xml-dev list is sponsored by XML.org, an initiative of OASIS
><http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To unsubscribe from this elist send a message with the single word
>"unsubscribe" in the body to: xml-dev-request@lists.xml.org
take it easy,
Charles Reitzel