OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Data warehousing and XML

[ Lists Home | Date Index | Thread Index ]
  • From: Graydon Hoare <gray@interlog.com>
  • To: xml-dev <xml-dev@ic.ac.uk>
  • Date: Tue, 2 Dec 1997 10:26:54 -0500 (EST)

On Tue, 2 Dec 1997, Paul Prescod wrote:

> neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.

er, you're missing something. The whole point of data mining is admitting
that all the schemas you will ever establish are in some way flawed, no
matter what you do. There would be no need for such tools if we were
simply able to see the future, and know that it's terribly important to
maintain a count of how many sticks of gum get shipped to guam on tuesdays
in december. 

This is precisely why text retrieval is so hard -- the "schema" that all
documents are written in is a human written language, and nobody knows how
to machine-process that. You can chunk it up all you like into logical
blocks, but you're always going to be missing certain substantive
information relating to the text. In fact, if you want to get really
finicky about it, plain vanilla transcribed text loses useful information
conveyed in spoken language, and requires an expert "document engineer"
to produce (compare a literate adult's writing to that of a child).

-graydon <graydon@pobox.com>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS