Lists Home |
Date Index |
On Tuesday 22 October 2002 18:21, Jonathan Robie wrote:
> At 06:07 PM 10/22/2002 +0100, Alaric B. Snell wrote:
> >I think that's the whole problem with XML, though :-)
> >Nobody's yet given a good reason why data encapsulation is bad.
> I didn't say that data encapsulation is bad - it's fundamentally useful in
> object oriented programming languages. Java and C++ would be much less
> useful without data encapsulation.
> XML is a plain-text data format, not a programming language. It was never
> designed for data hiding, which seems like a strange thing to impose on it.
> And even in binary representation, how would you want the DOM or SAX to
> handle data hiding? How would you want data hiding to affect queries on
Are you saying that data hiding is not useful in 'mobile data'?
Postscript is a plain-text data format with data hiding in it...
As for DOM and SAX - well, the encapsulated-XML code would either output a
DOM tree or a SAX event stream, and the 'parser' would automatically convert
one to t'other if the user's code requests what the document doesn't provide.
As for good reasons to go there:
1) Different access patterns. There are multiple different document
structures that encode the same information in different ways, and are
efficient for different types of queries. For a huge GIS document of many
gigabytes, one might want it fundamentally arranged by postcode, lat/long, or
place name - where postcodes and place names are irregular overlapping
regions containing a number of lat/long points. If your document was
architected as a huge list of points with the enclosing place name and/or
postcode as attributes, one might have a harder time doing queries on place
names than if there was an element per place name, each place then containing
the contained points. And if you say "Ahah! A neat XPath implementation might
index the attributes!" then I'd like to ask which neat XPath implementation
would recognise the lat/long attributes as cartesian coordinates and
construct an R-tree or grid file to perform polygon containment tests with :-)
The point being, one could define three different XML document formats - one
postcode based, one placename based, and one that's basically an R-tree for
fast lookup of lat/long. And they'd need different code to look in each (the
R-tree would probably be beyond the wit of XPath to query for points within a
But if you could define a single interface, like so:
void addPoints (Point points);
Point getPointsInPlace (String placeName);
Point getPointsInPostcode (String postcode);
Point getPointsInPolygon (Point corners);
Point getAllPoints ();
...and access all three document formats with the same interface, then
algorithms to plot all the points on a map could be reused between documents
from the three different applications. And it would be easy to convert
between formats when communicating between applications, to get the
appropriate structure for each app: "new RTreeDocument ().addPoints
2) Different algorithms. It's limiting to be stuck with 'passive' data.
Although XML isn't all that passive if you add XInclude, since it can query
HTTP servers to provide up-to-date data in place and generate stuff on the
fly (so anyone looking for a passive data format had better disable XInclude
:-), we will sometimes want to base our GIS documents on some static data, as
above, and sometimes the GIS document will "contain" nothing other than
details of a shared database server which handles the queries dynamically.
This pattern crops up in XML anyway because DOM is just an interface - it can
be implemented as an in-memory data structure, a parse-on-demand mixture of
in-memory data and as yet unparsed data in a buffer somewhere, a connection
to a remote XML database server, or an algorithm that generates the tree on
demand like a fractal.
3) Because sometimes it's behaviour that matters anyway. Our GIS document
interface above has an addPoints method. One might implement that in such a
way as to provide a rollback facility; a rollback() method might undo the
most recent addPoints, perhaps with unlimited undo. One might write an
implementation of DOM that does that, sure, but there might be a really
efficient algorithm for doing so at the representation level.
4) Any of the other reasons why data encapsulation is good, minus any that
somehow don't apply when dealing with information communicated between
platforms in distributed systems.
A city is like a large, complex, rabbit