Lists Home |
Date Index |
- From: Paul Prescod <email@example.com>
- To: xml-dev <firstname.lastname@example.org>
- Date: Tue, 02 Dec 1997 11:16:00 -0500
Graydon Hoare wrote:
> er, you're missing something. The whole point of data mining is admitting
> that all the schemas you will ever establish are in some way flawed, no
> matter what you do. There would be no need for such tools if we were
> simply able to see the future, and know that it's terribly important to
> maintain a count of how many sticks of gum get shipped to guam on tuesdays
> in december.
Right but Len's question was about having a "schema of *some kind*". The
closer your schema is to explicitly recognizing the information you want
to discover, the easier it is to discover the information. If you have
no schema then you are Very Far Away from that goal.
> This is precisely why text retrieval is so hard -- the "schema" that all
> documents are written in is a human written language, and nobody knows how
> to machine-process that. You can chunk it up all you like into logical
> blocks, but you're always going to be missing certain substantive
> information relating to the text.
Certainly, but those who actually do this processing still chunk it up
into the logical blocks because according to some schema, because that
is the way to get closest to achieving the goal. So in answer to Len's
question I still say that having a schema is better than not having one,
despite the fact that having the schema does not "solve" the problem. It
gets you closer to solving the problem.
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)