Lists Home |
Date Index |
8/19/2002 11:55:33 PM, "Thomas B. Passin" <email@example.com> wrote:
>Actually, I think we agree on just about everything except perhaps how
>possible it would be to have the computer end figure things out from
>context, which I still see as fairly hard.
I may well be over-optimistic; I'm trying to put together some code
to explore the issue. For what it's worth, my suspicion that there
*is* a lot one could do with fairly simple heuristics was strengthened
by reading http://www.paulgraham.com/spam.html (a discussion of
" A few simple rules will take a big bite out of your incoming spam.
Merely looking for the word "click" will catch 79.7% of the emails in
my spam corpus, with only 1.2% false positives."
Also check out Eugene Kuznetzov's article in XML Journal on
XML-aware network equipment http://www.sys-con.com/xml/articleprint.cfm?id=459
In discussing the challenge of recognizing a specific XML
vocabulary and routing messages in that vocabulary to a specialized
processor, he says "the same device could send messages in a particular
XML vocabulary to the server capable of processing them, or it could
send separate XML-RPC and SOAP messages. The routing rules are specified
using either proprietary pattern-matching languages or a limited subset of XPath."
This must be done under severe performance constraints: "Because enterprise
network equipment is expected to function at wirespeed (at least
Fast Ethernet or 100 megabits per second), the same is required of
the XML processing core embedded in the device."
So, it looks to me that it is quite possible to use pattern matching
and/or XPath "queries" to usefully perform tasks with heuristics
that "logically" require much more complex namespace processing and schema
Also, I really hate to mention this :-) but think of the "wonderful" job
that browsers do in making sense out of hideously invalid HTML. Is there
any reason to think that that level of creative hackery can't or won't
be applied to the challenge of making sense out of business messages
in XML, some of which will come from buggy software, some of which will be
human edited, some of which will come from organizations that support
newer versions of some spec than the receiver does, some will be
generated by software that interprets the ambiguities in the spec differently
from the receiver, some of which will come from software that "embraces and
extends" the spec .... ad nauseum? A "draconian" error handling policy
just won't be any more viable than it would have been in Netscape 1.0.
[I don't want to argue that any of this is a Good Thing ... just that it
is technically possible, and the business geeks will probably think it