RE: [xml-dev] XPath and a continuous, uniform information space

This discussion is very interesting ... and joins well with what I have been thinking and discussing and learning the last few months.

XML , HyperMedia, RDF , Web, InfoSpace

Trying to be concise ... (ha!)

One thing that makes me a bit uneasy or maybe excited ...

In the XML specs and toolchains, for the most part, but not entirely, The concept of document identity, location, and resolving is very carefully skirted.

Take the XPath functions doc() and collection() ... entirely implementation defined how to resolve a document given a string.

We generally take it for granted that doc( url ) will resolve the url using URL resource location (http, file ... ) and return the document if its found,

but that is not fully defined, and for good measure. Same thing (I think) with XML external entities, schemaLocation, XSD includes ...

( lets not worry about XQuery modules or XSLT includes yet) ...

There is this fuzzy space where XML doesnt exactly define how to resolve references to documents, nor does it supply a unique document ID.

( I am not quite sure of the later ... is doc("x" ) == doc("x") ? )

Then we step into the Webish world of HyperMedia, HTML, RDF etc ...

in some places URI's are supposed to be URL's and they are assumed to be resolvable by standard URL rules.

In other places (say RDF) they are simply identifiers ... but often overloaded to be resolvable entities.

Mixing up the rules for when a string is really resolvable and what that means exactly is a tedius ...

Many times we want the URL to be guaranteed to be resolvable or throw an application level error ...

Otherwise writing say XPath code that traverses documents would be full of try/catch all over the place.

But on the other hand, the Web is based soundly on failure. Assuming that URL's can always resolve is not a great idea.

So how do we have our cake and eat it too ? How do we define a loose coupling of URI's or document identifiers ...

We *want* them to resolve but we also want our app to be tolerant of failure.

I have been reading about RDF and federated queries. If I have this right .. (I may not) ... to do a query across multiple sources of RDF graphs you

have to first query the full (or sub) graph from each source then merge them together locally ... that is how you can finally do an RDF query and make sure it works,

because it is all local ... Very similar to closed XML Databases, or local filesystems. You load your content into the Database / filesystem and only then can you

feel all warm and fuzzy about document resolution always working.

But this doesnt sound very scalable ... it really means you need a copy of your universe before you can really query it reliably.

( I am fairly sure Google does ... ) But for mortals ... how does a truely distributed web based info space really work ?

How would Web based RDF or XML resources scale ? Until the internet is truly fault tolerant AND accessible (lets not worry yet about consistancy) how do you build a scalable infoset ? If you extend the languages to make document traversal easy ... but the infrastructure is not sound ... then your apps will just crash.

A shorthand notation for /doc1/doc2/doc3/doc4 ... doesn't make the resolution and retrieval any more resilient.

How do we make a Web based InfoSpace that is reliable ... without having to have a local copy of it all ?

I suggest the web works today because of *humans*. It is us humans that clickityclick the links and handle failover ... "Hmm google is down try yahoo",

"Hmm that youtube is showing the spinner try hitting refresh, aw screw it check my email" us humans are tolerant of failure when

it occurs directly as a result of our actions ... but when a system doing a million actions fails in the middle ... not so tolerant.

What's your Thesis printout going to look like if a few paragraphs are on inaccessible nodes right when you want to print it ... Is it all or nothing ? or maybe

just some 404's printed on page 84 ... but the app "succeeded"

How about that air traffic control app ... couldn't get the updated wind speed from windspeed.com so WTF ... lets just halt the planes for a few minutes.

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org