OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
"oid" - the Object ID in XML

Hello World @ [xml-dev],
I got great feedback yesterday, and I summarized it into a few paragraphs here for a more organized thread.  This will be the starting point (gotta start somewhere) for the "Internet Draft" or formal description.  IAM asking you all, world, for input that will help to clearly communicate and present the issue at hand.  Based on yesterdays input, this is a draft overview:  World - Help me refine it.
Lets start with getting terminology right.  "A Protocol" is a set of communication rules. When two parties agree on the specific use of a generic markup language like XML, they have agreed on a protocol.  Is everyone with me so far?  With this 'definition' of a protocol, your XML parser should be 'unaware' of any specific protocol as it deals with the general aspects of XML.
I propose adding a new keyword to XML, and I would like community feedback about it.  It would work like this: 
The tokenizer recognizes a special keyword attribute "oid" ONLY if it appears as the first attribute (because that is the only token we have parsed out yet in that element) Now the "oid (aka "Object ID") can be used to obtain the memory location (or application layer object instances) that the XML will parse directly into with no temporary memory copy into a tree or DOM structure.  This is not the syntax that my implementation uses but seems to convey the concept very well  (<foo @oid="123"/>)  The syntax is up for debate and easy to change.
It's OVER twice as fast as the more traditional "memory copy design" naturally because the iterations to the temporary structure are eliminated, it goes beyond 2 times as fast because the tokenizer uses neither SAX nor DOM, but a more efficient alternative to SAX that avoids pushing a variable number of arguments depending on the token type via the SAX calls.  The non-SAX design only makes calls to getToken(token *p) to pull the data over a 1 argument call stack.  Data that SAX would push via too many argument that compile down to needless push's ands pop's.  This implementation is about 3 times faster than the very best anyone can do with SAX, this makes it the most ideal solution for the massive sets used in a native BigData xml integration. Both for the speed and 50% memory reduction with the elimination of a temporary data structure during parsing.
The first priority is to understand this concept clearly.  Once clearly understood, it becomes clear that you could call this A) a protocol.   B) an application implementation   C) an extension to XML along a similar thought that added ETag into HTTP 1.1
An interesting point of analogy with "oid"=="Etag" , is that there is a second keyword used with "oid" .  This is the ONLY other keyword used besides "oid", it is "UpdateTime" and it is analogous to HTTP's "If-Modified-Since".  This HTTP analogy may help some with the conceptual summary of what we are attempting to summarize.  I should note some specific parsing rules here as well, if the first attribute token is not named "oid", then "UpdateTime" will not be expected or looked for.  If it were to exist, it would be a normal XML attribute of no special significance.
Since this concept has a requirement of an [attribute || special XML keyword (depending on your view)] named "oid", it could equally conceptually be a protocol (A protocol that the XML tokenizer is aware of?  That's odd.)  There is no other way to implement "the protocol".   If we call it  B) an implementation.  Now we have two XML's a fast one and a slow one.  A new one and an old one.  The new XML is 100% backwards compatible.  Again - this is all a matter of view.  The important thing is to understand the concept.
<Customer oid='1'>
Normally we would call "oid" an attribute.  In this case we will call it a keyword, therefore the value "1"is NOT Data in the XML document per se.  Note: If an attribute named "oid" contains any upper case or the attribute is not in the 1st position - then it IS data - otherwise it is a key to the parsing results destination memory.
The "oid" == the DBMS Index.  Just as a DBMS index can be keyed over multiple columns - likewise the "oid" can be a concatenation of several columns that make it a unique Object ID.  An application using an ConfigFile.xml would most likely never use an "oid".  It's not designed to be used in all applications.  Many XML Documents, unless they are indexed in by a DBMS would not be candidate for "oid" use.
Finally, Suppose you have a list of "foos".  Suppose that list is very very long. 1 Terabyte. Suppose that you get updated XML source of that list every 15 minutes.  In the huge list, some "foos" are new, some foos() have been updated.  With an "oid" we can parse this entire update, and make no new memory allocations except of the new "foo's".  The XML Parser never needs to allocate any memory storage if it can make a connection between the cached instance and an "oid".
Polished Source:
A Rough (and Rogue) Draft article/blog about this (open source) technology
Brian Aberle

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS