XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Why does XML call them "attributes" and notname-value pairs?

On Sun, 16 Jan 2022 20:48:24 +0000, Michael Kay wrote:
>>> 
>>> Here are the results:
>>> 
>>> an XmlDocument: 3176 bytes
>>> 
>>> an XmlElement: 656 bytes
>>> 
>>> an XmlAttribute: 152 bytes.
>> 
>> This is likely to be true for pretty much any XML tree model 
>> implementation. I'd be interested to know how one might avoid it.
> 
> Saxon's TinyTree model doesn't allocate a Java object per node. There 
> are minimally 6 arrays:
> 
> // nodeKind indicates the kind of node, e.g. element, text, or comment
> public byte[] nodeKind;
> 
> // depth is the depth of the node in the hierarchy, i.e. the number 
> of ancestors
> short[] depth;
> 
> // next is the node number of the next sibling
> // - unless it points backwards, in which case it is the node number 
> of the parent
> int[] next;
> 
> // alpha holds a value that depends on the node kind. For text nodes, 
> it is the offset
> // into the text buffer. For comments and processing instructions, it 
> is the offset into
> // the comment buffer. For elements, it is the index of the first 
> attribute node, or -1
> // if this element has no attributes.
> int[] alpha;
> 
> // beta holds a value that depends on the node kind. For text nodes, 
> it is the length
> // of the text. For comments and processing instructions, it is the 
> length of the text.
> // For elements, it is the index of the first namespace node, or -1
> // if this element has no namespaces.
> int[] beta;
> 
> // nameCode holds the name of the node, as an identifier resolved 
> using the name pool
> int[] nameCode;
> each holding one entry per node; which accounts precisely for the 
> measured space occupancy of 19 bytes for an empty element node.
> 
> Incidentally this structure not only has a very small memory 
> footprint, it also (and perhaps more importantly) gives very good CPU 
> caching behaviour. A search of the descendant axis for nodes with a 
> particular name is a tight loop scanning the namecode array testing 
> each integer against a constant.
> 
> Additional arrays are allocated (a) to hold type annotations, if the 
> document is schema-validated, and (b) to hold "preceding" pointers, 
> lazily constructed the first time the application attempts backwards 
> navigation. There are some additional refinements for example to give 
> faster parent access in a tree with very high fan-out (normally, 
> finding the parent involves scanning the "next" pointers until one is 
> found that points backwards.

That's fascinating. A tree model that has no node objects.

It does completely upend the analysis I provided (which should prolly 
be a lesson to me on sticking an oar in). I'm not sure how well your 
example program characterizes average size of attributes versus 
elements, though. In the original problem statement, the proposal was 
to (insofar as it is possible) replace elements with attributes, 
presumably substituting attributes for some significant subset of the 
descendant axis; one would expect to see relatively shallow trees (as 
opposed to element trees that are relatively less shallow, at least a 
depth of three, with most of the content at level three (root, items, 
contents); attribute trees would be root, 
items+contents-as-attributes). However, I'm not certain that the point 
can be sustained at all. I think Dimitre's original objection was based 
on directional acyclic graphs of objects, so if the model proposes this 
radically different tree design, its less likely that the objection 
would hold.

On Sun, 16 Jan 2022 20:30:46 +0000, Michael Kay wrote:
> For the TinyTree:
> 
> * the cost for an additional empty element is (992680 - 800992) / 
> 10000 = 19 bytes
> * the cost for an additional empty element plus empty attribute is 
> (1720944 - 900744) / 10000 = 82 bytes, so the attribute is 63 bytes
> 
> For the Linked Tree:
> 
> * the cost for an additional empty element is (4072008 - 2064384) / 
> 10000 = 200 bytes
> * the cost for an additional empty element plus empty attribute is 
> (8316768 - 4198024) / 10000 = 412 bytes, so the attribute is 212 bytes

[snip: a variant with text nodes and attribute value content]
> For the TinyTree:
> 
> * the cost for an additional single-character element is 38 bytes
> * the cost for an additional single-character attribute is 110 - 19  
> = 91 bytes
> 
> For the Linked Tree:
> 
> * the cost for an additional single-character element is 304 bytes
> * the cost for an additional single-character attribute is 439 - 200 
> = 239 bytes

It is also intriguing that for both TinyTree and LinkedTree, attributes 
are actually *more* expensive for the empty case, rather than (as 
Dimitre and I both suggested, in different fashions) less, or as Kit 
suggested, identical in memory footprint. For TinyTree, they remain 
more expensive when they contain text content; for LinkedTree they 
finally show up as having slightly less size than a (more or less) 
equivalent element.

I suspect (but am too lazy to investigate in detail) that this is 
because the LinkedTree more nearly matches my original analysis, which 
noted that elements are much more strongly linked-from and linked-to 
within an XML tree. For a design composed of a tree of nodes with 
navigational aids, I think the analysis holds (and the impulse to 
remove links to reduce size versus the impulse to add links to increase 
speed), but it clearly does not hold for the compact designs that 
you've described here.

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
There's someone in my head, but it's not me.
                -- Pink Floyd


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS