How to design XML in a way that focuses on the strengths of thecomputer?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Costello, Roger L." <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Sat, 21 Nov 2015 14:14:07 +0000

Hi Folks,

	Computers have always had a bit of a tenuous
	relationship with text. Although we tend to think
	of text processing as a central task for computer
	hardware and software (indeed, the 8-bit byte is
	the standard design element  in modern computers,
	in large part, due to how well suited it is for 
	encoding Western character sets), the truth of the
	matter is that the human concept of text is really
	alien to a computer.

	... handling text is not a computer's strength. It is
	a necessary evil best kept to a minimum. [Barski]

What is fundamental to computers? Answer: Among other things, memory addressing is fundamental. 

Can we design XML in a way that it focuses on the strengths of the computer?

Let's take an example. Suppose we want to retrieve "west". Consider this XML design:

	<edge>garden west door</edge>

That XML design represents "west" as text. "west" can be retrieved using string manipulation:

	substring-before(substring-after(., ' '), ' ')

You would be shocked at the huge number of machine instructions needed to implement that trivial XPath expression. Hundreds or thousands of machine instructions are needed. 

In an ideal world I should be able to retrieve "west" in a single machine instruction (or, a handful of machine instructions).

Here is an alternate XML design which avoids the use of text:

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

Node access is easy and fundamental to the XML language. Now "west" can be retrieved using this simple element reference:

	*[2]/name()

It seems to me that this should involve a simple memory address look-up, and the number of machine instructions required should be one (or a few). Alas, I discovered that it is highly dependent on the XPath processor (XML processor). In fact, I did some timing tests and, with the XPath engine that I used, there was no time difference between the above two XPath expressions. Bummer.

The XML specification is silent on how XML parsers should represent XML. Consequently, a parser might implement this: 

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

as a linked list, and therefore, with *[2]/name(), the XPath engine must traverse the linked list to obtain the second child element.

Conversely, if an XML parser were to represent child elements using an array:

   edges
   -----------
0 |      ---------> garden
   ----------
1 |     ----------> west
   ----------
2 |    ---------- > door
   ----------

then "west" is just a single memory reference away.

Are there any XML parsers that represent XML using arrays?

Is there any way to design XML to take advantage of a computer's strengths?

/Roger

[Barski] "Land of Lisp" by Dr. Conrad Barski

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]