Lists Home |
Date Index |
- From: "Borden, Jonathan" <email@example.com>
- To: <firstname.lastname@example.org>
- Date: Sun, 31 Jan 1999 13:33:49 -0500
Paul Prescod wrote:
> "Borden, Jonathan" wrote:
> > I like your criteria. Does the Internet fulfill this,
> with say HTML and
> > search engines?
> Search engines only do full-text search.
Err no. some do, others don't. For example, some engines index meta tags,
others full text and others can index arbitrary tags. some engines might
require a DTD and/or schema others not. This is the heart of the question:
what is the best way to index XML?
When I can upload an XQL, XML-QL
> or OQL query to AltaVista and get back the results in seconds then I will
> be impressed both with it scalability and flexibility. Right now I am only
> impressed with its scalability.
> > The point being that a flatfile system with appropriate
> > indexing, caching and distribution can handle all sorts of
> information needs
> Fine. But does anyone know how to do the "appropiate indexing, caching and
> distribution" for the kind of thing I described above?
This is the billion dollar question. The point that I am trying to make on
the "Storing lots of Fiddly bits" thread is that it is not *how* the data is
stored e.g. flatfile, object or relational database that is the issue,
rather *what* you wish to do that is the important issue. If we define the
problem as the ability to run an XQL query against an index of XML documents
and return a result in a matter of seconds this imposes specific
requirements on a system. Let's go further and assume for the moment that we
have already converted all HTML documents on the Web into Vogager (i.e. HTML
as XML). Now suppose that we want to be able to run an XQL query against the
entire Web. Has anyone done this? Clearly no. If anyone capable of doing
this today. Probably not. Are there people working on this? Yes. So lets
hear what strategies are best used in this situation. My personal opinion is
that it is a hybrid relational and 'object' database approach.
The problem with a straight relational approach is that we need to model
containment and heirarchies... in SQL terms this means joins, perhaps
multilevel joins when documents are deep.
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)