[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] RFC for XML Object Parsing
- From: Michael Kay <mike@saxonica.com>
- To: Brian Aberle <xmlboss@live.com>
- Date: Sun, 23 Mar 2014 23:50:56 +0000
Sorry, I'm going to be brutal. If I were on the programme committee for an XML conference and this was a submission for a conference paper, I would vote for rejection. Not because your idea is necessarily bad, but because your explanation of the idea is incoherent. You may have an idea here that's worthy of adoption, but it's impossible to tell unless you can write it up properly.
In fact I would really encourage you to sit down and write a conference paper. Start with an introduction that says what problem you are trying to solve, then give background about other work in the same area and explain why you think the problem remains unsolved. Then explain your approach, defining your terms carefully. If you want to compare your approach in terms of delivered performance, that's great, but don't expect anyone to accept the paper if you simply shout that "it's three times faster" without saying what you are measuring and what you are comparing against.
So far, you haven't really explained in terms that anyone seems able to understand exactly what this new OID attribute contains. You say it "can be used to obtain the memory location that the XML will parse directly in to". Could you please be more explicit? My guess from your description is that the creator of the XML is doing extra work to reduce the work done by the recipient of the XML, that is, the creator is providing extra redundant information which can be used by the parser at the receiving end for optimization. Is that a reasonable characterization? If so, I would want to see you compare it with other techniques that put more burden on the creator of the XML in order to improve parsing speed, such as some of the binary XML approaches.
Submitting this idea to IETF for standardization before you have even published a peer-reviewed conference paper on it is just crazy. That's not the way things work.
Michael Kay
Saxonica
I need an XML expert to correct me if I have any terminology wrong here. I wrote my first two XML parsers before W3C finalized XML 1.0 and I wrote my own XSLT - but I don't claim to know it all about XML even though folks with lesser study than me claim to know all about XML. Maybe someone here can intelligently comment on this:
Lets start with getting terminology right. "A Protocol" is a set of communication rules. When two parties agree on the specific use of a generic markup language like XML, they have agreed on a protocol. Is everyone with me so far? With this 'definition' of a protocol, your XML parser should be 'unaware' of any specific protocol as it deals with the general aspects of XML. I propose adding a new keyword to XML, and I would like community feedback about it. It would work like this:
The tokenizer recognizes a special keyword attribute "oid" ONLY if it appears as the first attribute (because that is the only token we have parsed out yet in that element) Now the "Object ID" can be used to obtain the memory location (or application layer object instances) that the XML will parse directly into with no temporary memory copy into a tree or DOM structure. It's OVER twice as fast as the more traditional "memory copy design" naturally because the iterations to the temporary structure are eliminated, it goes beyond 2 times as fast because the tokenizer uses neither SAX nor DOM, but a more efficient alternative to SAX that avoids pushing a variable number of arguments depending on the token type via the SAX calls. The non-SAX design only makes calls to getToken(token *p) to pull the data over a 1 argument call stack. Data that SAX would push via too many argument that compile down to needless push's ands pop's. This implementation is about 3 times faster than the very best anyone can do with SAX, this makes it the most ideal solution for the massive sets used in a native BigData xml integration.
Since this thing(XML 1.2 or a new protocol) or has a requirement of an attribute named "oid" it could equally conceptually be a protocol (A protocol that the XML tokenizer is aware of?) There is no other way to implement "the protocol". I have gone to much effort to try to communicate this clearly, and I developed a simple little example that breaks it all down into numbers that you can see and understand. The examples build on Linux and Windows. Please give me some feedback about standardizing this. I want to know what some smart internet savvy people think about this. Am I in the right place? I'd like to see some community feedback about standardizing this.
As explained in the introduction in the article link below, oid is to XML what ETag is to HTTP. HTTP 1.0 did not standardize any way to cache web pages. HTTP 1.1 added Etag. That same concept of caching allows XML to enter a whole new dimension of usage. Am I wrong? Look at Two example programs "TheOIDProtocol" and "ExIndexObjects". The Numbers will have the final word.
Polished Source:
https://onedrive.live.com/redir?resid=D7EC275E76D295CF!923&authkey=!AAnvh0CKDY4nuho&ithint=file%2c.zip
A Rough (and Rogue) Draft article about this (open source) technology
http://www.codeproject.com/Articles/37850/XMLFoundation
Brian Aberle
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]