xml-dev - Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are

Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are

[ Lists Home | Date Index | Thread Index ]

To: Jonathan Robie <jonathan.robie@datadirect-technologies.com>,Mike Champion <mc@xegesis.org>,xml-dev <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
From: "Alaric B. Snell" <alaric@alaric-snell.com>
Date: Tue, 22 Oct 2002 18:55:13 +0100
In-reply-to: <5.1.0.14.0.20021022131720.041e4cd0@ncmail.datadirect-technologies.com>
References: <JDKJKHK75CICHGVR1ZYS4Z1T0482UQ.3db5804b@MChamp> <5.1.0.14.0.20021022131720.041e4cd0@ncmail.datadirect-technologies.com>

On Tuesday 22 October 2002 18:21, Jonathan Robie wrote:
> At 06:07 PM 10/22/2002 +0100, Alaric B. Snell wrote:
> >I think that's the whole problem with XML, though :-)
> >
> >Nobody's yet given a good reason why data encapsulation is bad.
>
> I didn't say that data encapsulation is bad - it's fundamentally useful in
> object oriented programming languages. Java and C++ would be much less
> useful without data encapsulation.
>
> XML is a plain-text data format, not a programming language. It was never
> designed for data hiding, which seems like a strange thing to impose on it.
> And even in binary representation, how would you want the DOM or SAX to
> handle data hiding? How would you want data hiding to affect queries on
> documents?

Are you saying that data hiding is not useful in 'mobile data'?

Postscript is a plain-text data format with data hiding in it...

As for DOM and SAX - well, the encapsulated-XML code would either output a 
DOM tree or a SAX event stream, and the 'parser' would automatically convert 
one to t'other if the user's code requests what the document doesn't provide.

As for good reasons to go there:

1) Different access patterns. There are multiple different document 
structures that encode the same information in different ways, and are 
efficient for different types of queries. For a huge GIS document of many 
gigabytes, one might want it fundamentally arranged by postcode, lat/long, or 
place name - where postcodes and place names are irregular overlapping 
regions containing a number of lat/long points. If your document was 
architected as a huge list of points with the enclosing place name and/or 
postcode as attributes, one might have a harder time doing queries on place 
names than if there was an element per place name, each place then containing 
the contained points. And if you say "Ahah! A neat XPath implementation might 
index the attributes!" then I'd like to ask which neat XPath implementation 
would recognise the lat/long attributes as cartesian coordinates and 
construct an R-tree or grid file to perform polygon containment tests with :-)

The point being, one could define three different XML document formats - one 
postcode based, one placename based, and one that's basically an R-tree for 
fast lookup of lat/long. And they'd need different code to look in each (the 
R-tree would probably be beyond the wit of XPath to query for points within a 
polygon).

But if you could define a single interface, like so:

void addPoints (Point[] points);
Point[] getPointsInPlace (String placeName);
Point[] getPointsInPostcode (String postcode);
Point[] getPointsInPolygon (Point[] corners);
Point[] getAllPoints ();

...and access all three document formats with the same interface, then 
algorithms to plot all the points on a map could be reused between documents 
from the three different applications. And it would be easy to convert 
between formats when communicating between applications, to get the 
appropriate structure for each app: "new RTreeDocument ().addPoints 
(postcodeDocument.getAllPoints ())".

2) Different algorithms. It's limiting to be stuck with 'passive' data. 
Although XML isn't all that passive if you add XInclude, since it can query 
HTTP servers to provide up-to-date data in place and generate stuff on the 
fly (so anyone looking for a passive data format had better disable XInclude 
:-), we will sometimes want to base our GIS documents on some static data, as 
above, and sometimes the GIS document will "contain" nothing other than 
details of a shared database server which handles the queries dynamically. 
This pattern crops up in XML anyway because DOM is just an interface - it can 
be implemented as an in-memory data structure, a parse-on-demand mixture of 
in-memory data and as yet unparsed data in a buffer somewhere, a connection 
to a remote XML database server, or an algorithm that generates the tree on 
demand like a fractal.

3) Because sometimes it's behaviour that matters anyway. Our GIS document 
interface above has an addPoints method. One might implement that in such a 
way as to provide a rollback facility; a rollback() method might undo the 
most recent addPoints, perhaps with unlimited undo. One might write an 
implementation of DOM that does that, sure, but there might be a really 
efficient algorithm for doing so at the representation level.

4) Any of the other reasons why data encapsulation is good, minus any that 
somehow don't apply when dealing with information communicated between 
platforms in distributed systems.

> Jonathan

ABS

-- 
A city is like a large, complex, rabbit
 - ARP

References:
- XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! LongLive the Browser Wars!)
  - From: Mike Champion <mc@xegesis.org>
- Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
  - From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>

Prev by Date: Re: [xml-dev] Specifying a Unicode subset
Next by Date: RE: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
Previous by thread: Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
Next by thread: Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
Index(es):
- Date
- Thread