RE: [xml-dev] RFC for XML Object Parsing

Peter.

Good input, here is some brief output.

You said, "Basically, it seems you believe that some number of endpoints are going to share some deep understanding of the same object model so that you can subsequently exploit this shared understanding to enable some efficiencies in the XML parsing process?"

Not exactly, it not that they intimately share an object model, they only share a key to the data. Every Invoice has an invoice Number. EDI 810 says so. When modeling that in XML set the oid= to the unique key(in this case the invoice number). Attribute order matters,oid must be first however, OID IS NEVER REQUIRED. It is as you say - an optimization - it is an optional optimization. Nobody will be forced to retrofit into an existing design that was depending on the principle that attribute order is, was, and always will be insignificant. I can imagine that in some existing implementations adding "oid" under the conditions that it be first may not be simple, in other cases it's a 1 liner. If "oid" is unknown, the data goes through the logic already in place. If "oid" is there then we can parse triple fast.

You said "your analogy to HTTP caching is, at best not applicable, and at worst, possibly completely flawed:". I can accept that this analogy is largely misrepresenting of the case at hand. I will remove it. I believe it added confusion yesterday.

You said," Why not use some more specialized data interchange that is completely optimized for your data exchange problems?"
I resisted XML a little when it was new because of the overhead. A faster format was certainly considered, however because of the powerful integration aspect of XML, the tradeoff was considered to be worth it. Most commonly, transactional XML parsing is only a latency not a bottleneck. A latency to be avoided like all the others. The acceptable uses cases are somewhat defined by it's performance. If XML can be made faster, it can be used where previously a proprietary format or faster format was the only option.

Brian

On Sun, Mar 23, 2014 at 12:40 AM, Brian Aberle <xmlboss@live.com> wrote:

Hello World,

I need an XML expert to correct me if I have any terminology wrong here. I wrote my first two XML parsers before W3C finalized XML 1.0 and I wrote my own XSLT - but I don't claim to know it all about XML even though folks with lesser study than me claim to know all about XML. Maybe someone here can intelligently comment on this:

Lets start with getting terminology right. "A Protocol" is a set of communication rules. When two parties agree on the specific use of a generic markup language like XML, they have agreed on a protocol. Is everyone with me so far? With this 'definition' of a protocol, your XML parser should be 'unaware' of any specific protocol as it deals with the general aspects of XML.

I propose adding a new keyword to XML, and I would like community feedback about it. It would work like this:

The tokenizer recognizes a special keyword attribute "oid" ONLY if it appears as the first attribute (because that is the only token we have parsed out yet in that element) Now the "Object ID" can be used to obtain the memory location (or application layer object instances) that the XML will parse directly into with no temporary memory copy into a tree or DOM structure. It's OVER twice as fast as the more traditional "memory copy design" naturally because the iterations to the temporary structure are eliminated, it goes beyond 2 times as fast because the tokenizer uses neither SAX nor DOM, but a more efficient alternative to SAX that avoids pushing a variable number of arguments depending on the token type via the SAX calls. The non-SAX design only makes calls to getToken(token *p) to pull the data over a 1 argument call stack. Data that SAX would push via too many argument that compile down to needless push's ands pop's. This implementation is about 3 times faster than the very best anyone can do with SAX, this makes it the most ideal solution for the massive sets used in a native BigData xml integration.

Since this thing(XML 1.2 or a new protocol) or has a requirement of an attribute named "oid" it could equally conceptually be a protocol (A protocol that the XML tokenizer is aware of?)  There is no other way to implement "the protocol". I have gone to much effort to try to communicate this clearly, and I developed a simple little example that breaks it all down into numbers that you can see and understand. The examples build on Linux and Windows.  Please give me some feedback about standardizing this. I want to know what some smart internet savvy people think about this. Am I in the right place? I'd like to see some community feedback about standardizing this.

As explained in the introduction in the article link below, oid is to XML what ETag is to HTTP. HTTP 1.0 did not standardize any way to cache web pages. HTTP 1.1 added Etag.  That same concept of caching allows XML to enter a whole new dimension of usage. Am I wrong? Look at Two example programs "TheOIDProtocol" and "ExIndexObjects".   The Numbers will have the final word.


Polished Source:
https://onedrive.live.com/redir?resid=D7EC275E76D295CF!923&authkey=!AAnvh0CKDY4nuho&ithint=file%2c.zip
A Rough (and Rogue) Draft article about this (open source) technology
http://www.codeproject.com/Articles/37850/XMLFoundation


Brian Aberle