I hadn't realised that you could get such a high precision memory instrumentation in C#.
With SaxonCS, on the TinyTree, nodes aren't allocated as individual objects, so we need to do bulk allocation and then compute an average.
private void buildDocWithElements(TreeModel model, int count) {
long mem = GC.GetTotalMemory(true);
StringBuilder sb = new StringBuilder("<doc>");
for (int i = 0; i < count; i++) {
sb.Append("<a/>");
}
sb.Append("</doc>");
Processor proc = new Processor();
DocumentBuilder db = proc.NewDocumentBuilder();
db.TreeModel = model;
XdmNode doc = db.Build(new StringReader(sb.ToString()));
sb = null;
Console.WriteLine("Memory: " + model + " " + count + " elements = " + (GC.GetTotalMemory(true) - mem));
}
private void buildDocWithAttributes(TreeModel model, int count) {
long mem = GC.GetTotalMemory(true);
StringBuilder sb = new StringBuilder("<doc>");
for (int i = 0; i < count; i++) {
sb.Append("<a b=''/>");
}
sb.Append("</doc>");
Processor proc = new Processor();
DocumentBuilder db = proc.NewDocumentBuilder();
db.TreeModel = model;
XdmNode doc = db.Build(new StringReader(sb.ToString()));
sb = null;
Console.WriteLine("Memory: " + model + " " + count + " attributes = " + (GC.GetTotalMemory(true) - mem));
}
[Test]
public void TestMemoryUsed() {
buildDocWithElements(TreeModel.TinyTree, 10000);
buildDocWithElements(TreeModel.TinyTree, 20000);
buildDocWithAttributes(TreeModel.TinyTree, 10000);
buildDocWithAttributes(TreeModel.TinyTree, 20000);
buildDocWithElements(TreeModel.LinkedTree, 10000);
buildDocWithElements(TreeModel.LinkedTree, 20000);
buildDocWithAttributes(TreeModel.LinkedTree, 10000);
buildDocWithAttributes(TreeModel.LinkedTree, 20000);
}
and it produced this output:
Memory: TinyTree 10000 elements = 800992
Memory: TinyTree 20000 elements = 992680
Memory: TinyTree 10000 attributes = 900744
Memory: TinyTree 20000 attributes = 1720944
Memory: LinkedTree 10000 elements = 2064384
Memory: LinkedTree 20000 elements = 4072008
Memory: LinkedTree 10000 attributes = 4198024
Memory: LinkedTree 20000 attributes = 8316768
But note that when we add 10000 attributes we are also adding 10000 elements.
My conclusions from this:
For the TinyTree:
* the cost for an additional empty element is (992680 - 800992) / 10000 = 19 bytes
* the cost for an additional empty element plus empty attribute is (1720944 - 900744) / 10000 = 82 bytes, so the attribute is 63 bytes
For the Linked Tree:
* the cost for an additional empty element is (4072008 - 2064384) / 10000 = 200 bytes
* the cost for an additional empty element plus empty attribute is (8316768 - 4198024) / 10000 = 412 bytes, so the attribute is 212 bytes
These are close to what I would predict from the design.
Measuring empty elements and attributes is a bit artificial. If we make the values in each case be a single ASCII character the numbers change to
Memory: TinyTree 10000 elements = 994320
Memory: TinyTree 20000 elements = 1379024
Memory: TinyTree 10000 attributes = 1176816
Memory: TinyTree 20000 attributes = 2273088
Memory: LinkedTree 10000 elements = 3103296
Memory: LinkedTree 20000 elements = 6148136
Memory: LinkedTree 10000 attributes = 4478456
Memory: LinkedTree 20000 attributes = 8868808
meaning:
For the TinyTree:
* the cost for an additional single-character element is 38 bytes
* the cost for an additional single-character attribute is 110 - 19 = 91 bytes
For the Linked Tree:
* the cost for an additional single-character element is 304 bytes
* the cost for an additional single-character attribute is 439 - 200 = 239 bytes
Note: from the design (not from measurement) the size should be independent of the length of the name, provided the same names are used repeatedly.
Michael Kay
Saxonica