Re: [xml-dev] RFC for XML Object Parsing

So far, you haven't really explained in terms that anyone seems able to understand exactly what this new OID attribute contains. You say it "can be used to obtain the memory location that the XML will parse directly in to". Could you please be more explicit? My guess from your description is that the creator of the XML is doing extra work to reduce the work done by the recipient of the XML, that is, the creator is providing extra redundant information which can be used by the parser at the receiving end for optimization. Is that a reasonable characterization? If so, I would want to see you compare it with other techniques that put more burden on the creator of the XML in order to improve parsing speed, such as some of the binary XML approaches.

Saxonica

On 23 Mar 2014, at 05:40, Brian Aberle <xmlboss@live.com> wrote:

Hello World,

I need an XML expert to correct me if I have any terminology wrong here. I wrote my first two XML parsers before W3C finalized XML 1.0 and I wrote my own XSLT - but I don't claim to know it all about XML even though folks with lesser study than me claim to know all about XML. Maybe someone here can intelligently comment on this:

Lets start with getting terminology right. "A Protocol" is a set of communication rules. When two parties agree on the specific use of a generic markup language like XML, they have agreed on a protocol. Is everyone with me so far? With this 'definition' of a protocol, your XML parser should be 'unaware' of any specific protocol as it deals with the general aspects of XML.

I propose adding a new keyword to XML, and I would like community feedback about it. It would work like this:

The tokenizer recognizes a special keyword attribute "oid" ONLY if it appears as the first attribute (because that is the only token we have parsed out yet in that element) Now the "Object ID" can be used to obtain the memory location (or application layer object instances) that the XML will parse directly into with no temporary memory copy into a tree or DOM structure. It's OVER twice as fast as the more traditional "memory copy design" naturally because the iterations to the temporary structure are eliminated, it goes beyond 2 times as fast because the tokenizer uses neither SAX nor DOM, but a more efficient alternative to SAX that avoids pushing a variable number of arguments depending on the token type via the SAX calls. The non-SAX design only makes calls to getToken(token *p) to pull the data over a 1 argument call stack. Data that SAX would push via too many argument that compile down to needless push's ands pop's. This implementation is about 3 times faster than the very best anyone can do with SAX, this makes it the most ideal solution for the massive sets used in a native BigData xml integration.

Since this thing(XML 1.2 or a new protocol) or has a requirement of an attribute named "oid" it could equally conceptually be a protocol (A protocol that the XML tokenizer is aware of?)  There is no other way to implement "the protocol". I have gone to much effort to try to communicate this clearly, and I developed a simple little example that breaks it all down into numbers that you can see and understand. The examples build on Linux and Windows.  Please give me some feedback about standardizing this. I want to know what some smart internet savvy people think about this. Am I in the right place? I'd like to see some community feedback about standardizing this.

As explained in the introduction in the article link below, oid is to XML what ETag is to HTTP. HTTP 1.0 did not standardize any way to cache web pages. HTTP 1.1 added Etag.  That same concept of caching allows XML to enter a whole new dimension of usage. Am I wrong? Look at Two example programs "TheOIDProtocol" and "ExIndexObjects".   The Numbers will have the final word.


Polished Source:
https://onedrive.live.com/redir?resid=D7EC275E76D295CF!923&authkey=!AAnvh0CKDY4nuho&ithint=file%2c.zip
A Rough (and Rogue) Draft article about this (open source) technology
http://www.codeproject.com/Articles/37850/XMLFoundation


Brian Aberle