XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
How to design XML in a way that focuses on the strengths of thecomputer?

Hi Folks,

	Computers have always had a bit of a tenuous
	relationship with text. Although we tend to think
	of text processing as a central task for computer
	hardware and software (indeed, the 8-bit byte is
	the standard design element  in modern computers,
	in large part, due to how well suited it is for 
	encoding Western character sets), the truth of the
	matter is that the human concept of text is really
	alien to a computer.

	... handling text is not a computer's strength. It is
	a necessary evil best kept to a minimum. [Barski]

What is fundamental to computers? Answer: Among other things, memory addressing is fundamental. 

Can we design XML in a way that it focuses on the strengths of the computer?

Let's take an example. Suppose we want to retrieve "west". Consider this XML design:

	<edge>garden west door</edge>

That XML design represents "west" as text. "west" can be retrieved using string manipulation:

	substring-before(substring-after(., ' '), ' ')

You would be shocked at the huge number of machine instructions needed to implement that trivial XPath expression. Hundreds or thousands of machine instructions are needed. 

In an ideal world I should be able to retrieve "west" in a single machine instruction (or, a handful of machine instructions).

Here is an alternate XML design which avoids the use of text:

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

Node access is easy and fundamental to the XML language. Now "west" can be retrieved using this simple element reference:

	*[2]/name()

It seems to me that this should involve a simple memory address look-up, and the number of machine instructions required should be one (or a few). Alas, I discovered that it is highly dependent on the XPath processor (XML processor). In fact, I did some timing tests and, with the XPath engine that I used, there was no time difference between the above two XPath expressions. Bummer.

The XML specification is silent on how XML parsers should represent XML. Consequently, a parser might implement this: 

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

as a linked list, and therefore, with *[2]/name(), the XPath engine must traverse the linked list to obtain the second child element.

Conversely, if an XML parser were to represent child elements using an array:

   edges
   -----------
0 |      ---------> garden
   ----------
1 |     ----------> west
   ----------
2 |    ---------- > door
   ----------

then "west" is just a single memory reference away.

Are there any XML parsers that represent XML using arrays?

Is there any way to design XML to take advantage of a computer's strengths?

/Roger

[Barski] "Land of Lisp" by Dr. Conrad Barski


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS