xml-dev - Re: [xml-dev] Reality check needed ....

Re: [xml-dev] Reality check needed ....

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Reality check needed ....
From: Mike Champion <mc@xegesis.org>
Date: Wed, 07 Aug 2002 09:40:28 -0400
In-reply-to: <000801c23e10$ea313f10$fe193044@tbp1>

8/7/2002 8:49:48 AM, "Thomas B. Passin" <tpassin@comcast.net> wrote:

>But file extensions are __very helpful__ for humans, just like
>human-readable element names are.

Back to the original question, "why might Microsoft think that
XML database technologies could help people find things on their
personal hard drives more effectively," this discussion of filenames
suggests a few things.

- Filesystems are hiearchical, XPath was both designed to
  query hierarchies and modelled on the filesystem paradigm.
  Queries such as "The HTML file that is in
  a directory called "samples" somewhere under 'Program Files'
  that I modified in June 2002"  XPath could handle the 
  "samples directory somewhere under 'Program Files' much better
  than SQL could, AFAIK.

- File content is becoming more XML-like.   If the system indexed the
  HTML after parsing into a well-formed tree, you could use XPath
  to find content within tables, or div tags, or other structuring
  mechanisms, that would be difficult with SQL or full-text searches.

- "Real" XML is becoming more pervasive.  Presumably the XML formats
   of OpenOffice/StarOffice and (maybe) Office 11 files 
   could be exploited to find "the section labelled
   'Afghanistan' within the section labelled 'Wars' containing
   the word 'helicopter crash'" or whatever.  

Back to the file extensions, the OS could keep track of metadata to 
"know" that a particular file is XHTML, or SVG, or XSLT, irrespective
of the extension.  (By keeping track of what application edited the
file last).  3rd-party indexers could do the same thing by 
sniffing for namespaces or validating or whatever.

Anyway, this is starting to make sense ... the OS or a 3rd-party filesystem
indexer has a combination of information about a file's metadata (mod
date, size, owner), its content (type inferred somehow, possibly its
hierarchical internal structure), and its position in the filesystem
hierarchy.  Querying all that hierarchical data and metadata simultaneously
DOES sound like a job for XQuery, or SQL+XPath, or XPath+a join mechanism,
or whatever.

References:
- Re: [xml-dev] Reality check needed ....
  - From: "Thomas B. Passin" <tpassin@comcast.net>

Prev by Date: Re: [xml-dev] W3C XFrames - first public WD published
Next by Date: RE: [xml-dev] Reality check needed ....
Previous by thread: Re: [xml-dev] Reality check needed ....
Next by thread: RE: [xml-dev] Reality check needed ....
Index(es):
- Date
- Thread