[
Lists Home |
Date Index |
Thread Index
]
8/7/2002 8:49:48 AM, "Thomas B. Passin" <tpassin@comcast.net> wrote:
>But file extensions are __very helpful__ for humans, just like
>human-readable element names are.
Back to the original question, "why might Microsoft think that
XML database technologies could help people find things on their
personal hard drives more effectively," this discussion of filenames
suggests a few things.
- Filesystems are hiearchical, XPath was both designed to
query hierarchies and modelled on the filesystem paradigm.
Queries such as "The HTML file that is in
a directory called "samples" somewhere under 'Program Files'
that I modified in June 2002" XPath could handle the
"samples directory somewhere under 'Program Files' much better
than SQL could, AFAIK.
- File content is becoming more XML-like. If the system indexed the
HTML after parsing into a well-formed tree, you could use XPath
to find content within tables, or div tags, or other structuring
mechanisms, that would be difficult with SQL or full-text searches.
- "Real" XML is becoming more pervasive. Presumably the XML formats
of OpenOffice/StarOffice and (maybe) Office 11 files
could be exploited to find "the section labelled
'Afghanistan' within the section labelled 'Wars' containing
the word 'helicopter crash'" or whatever.
Back to the file extensions, the OS could keep track of metadata to
"know" that a particular file is XHTML, or SVG, or XSLT, irrespective
of the extension. (By keeping track of what application edited the
file last). 3rd-party indexers could do the same thing by
sniffing for namespaces or validating or whatever.
Anyway, this is starting to make sense ... the OS or a 3rd-party filesystem
indexer has a combination of information about a file's metadata (mod
date, size, owner), its content (type inferred somehow, possibly its
hierarchical internal structure), and its position in the filesystem
hierarchy. Querying all that hierarchical data and metadata simultaneously
DOES sound like a job for XQuery, or SQL+XPath, or XPath+a join mechanism,
or whatever.
|