RE: [xml-dev] RFC for XML Object Parsing

Doug,
You said, "a shot in the dark, but since attribute order is important but apparently can't be guaranteed..."
I say, "It can be guaranteed - bit not with ALL parsers."

This curious anomaly of XML 1.0/1.1 leaves attribute ordering to the discretion of the all-powerful character catching parser geek in charge of the parser implementation. If the application required that the attributes are ordered - that is a problem that technically can't be solved directly by XML 1.0/1.1. If an implementation were to remain "parser unspecific" it would be forced to add a numerical ID after each attributes so they could be re-ordered, or the application could maintain a hashmap where each attribute name hashes to the position it should appear - that app would then write a layer over the XML parser that re-orders them.

Google "XML Attribute Order", OMG - we are not the first to discuss this situation.

I reckon that there are many forms of syntax that this OID concept can be implemented with, just as there are many implementation roads that go around this one (small) missing road in XML 1.0/1.1. Considering that performance is 1 important reason for the design of OID, IMHO the implementation should be "designed" for performance. For example, we could parse all attributes and hunt for the OID - that eliminates the issue all together. It also means that you pay a tax for every element that does not have an OID. 99.9% of them. Lets call that "The IRS design" that the taxman loves because it works with any XML parser.

As I said, performance is 1 reason for the OID. Initially that was the only reason, but the OID was exploited to do other things as well. It allows one fragment of XML to refer to a fragment of XML that is shared in memory. Michael Kay - Your comments about scope were needed, and you brought up an important bit that I had not mentioned - please think on this bit as well.

<MyThing oid="1" color="Green"/>
<MyOtherThing size="large">
<MyThing oid='1'/>
<.MyOtherThing>

On the NCIS project (where all this stuff was invented) we would say "MyOtherThing contains an 'Object Marker' to MyThing'.
We had to invent all kinds of words. The term "Object Marker" was used when an empty element was used to contain a reference to the memory instance of that element which contained all the data. In this case the OID was being used for an entirely new reason - this is not about parsing performance - it's about optimal memory usage. (See: MemberDescriptor.cpp Line 578) (See: xmlObject.cpp Line 1919) (See: xmlObject.cpp Line 1962)

Additionally this concept can be used to page out memory instance to disk. The "OID" is a key. It may resolve to physical memory of disk. This is the Interface that was added to implement this 'paging' concept in xmlObject.h

// state attachment/detatchment from StateCache
void ReStoreState(const char * oid);
void StoreState();

The OID does all this other stuff that I failed to mention until just now.

Brian

From: doug.duboulay@gmail.com
To: xml-dev@lists.xml.org
Date: Tue, 25 Mar 2014 16:57:30 +0800
Subject: Re: [xml-dev] RFC for XML Object Parsing

Brian,

a shot in the dark, but since attribute order is important but

apparently can't be guaranteed, could you perhaps stuff the oid

and update time into XML Processing Instructions injected as

preceeding sibblings for each element/object that has an id

you care about?

Doug

On Mon, 24 Mar 2014 01:25:01 PM Brian Aberle wrote:

> Peter said,

> "Basically, it seems you believe that some number of endpoints are

> going to share some deep understanding of the same object model so that you

> can subsequently exploit this shared understanding to enable some

> efficiencies in the XML parsing process?"

> Not exactly, it not that they intimately share an object model, they only

> share a key to the data. Every Invoice has an invoice Number. EDI 810

> says so. When modeling that in XML set the oid= to the unique key(in this

> case the invoice number). Attribute order matters,oid must be first

> however, OID IS NEVER REQUIRED. It is as you say - an optimization - it is

> an optional optimization. Nobody will be forced to retrofit into an

> existing design that was depending on the principle that attribute order

> is, was, and always will be insignificant. I can imagine that in some

> existing implementations adding "oid" under the conditions that it be first

> may not be simple, in other cases it's a 1 liner. If "oid" is unknown, the

> data goes through the logic already in place. If "oid" is there then we

> can parse triple fast.