OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] patterns vs. identifiers

[ Lists Home | Date Index | Thread Index ]

8/19/2002 11:55:33 PM, "Thomas B. Passin" <tpassin@comcast.net> wrote:

>Actually, I think we agree on just about everything except perhaps how
>possible it would be to have the computer end figure things out from
>context, which I still see as fairly hard.  

I may well be over-optimistic; I'm trying to put together some code
to explore the issue.  For what it's worth, my suspicion that there
*is* a lot one could do with  fairly simple heuristics was strengthened
by reading http://www.paulgraham.com/spam.html  (a discussion of 
spam filtering):

" A few simple rules will take a big bite out of your incoming spam. 
Merely looking for the word "click" will catch 79.7% of the emails in 
my spam corpus, with only 1.2% false positives."

Also check out Eugene Kuznetzov's article in XML Journal on 
XML-aware network equipment http://www.sys-con.com/xml/articleprint.cfm?id=459
In discussing the challenge of recognizing a specific XML
vocabulary and routing messages in that vocabulary to a specialized
processor, he says "the same device could send messages in a particular
 XML vocabulary to the server capable of processing them, or it could 
send separate XML-RPC and SOAP messages. The routing rules are specified 
using either proprietary pattern-matching languages or a limited subset of XPath."

This must be done under severe performance constraints: "Because enterprise 
network equipment is expected to function at wirespeed (at least 
Fast Ethernet or 100 megabits per second), the same is required of 
the XML processing core embedded in the device."

So, it looks to me that it is quite possible to use pattern matching
and/or XPath "queries" to usefully perform tasks with heuristics
that "logically" require much more complex namespace processing and schema
type validation. 

Also, I really hate to mention this :-) but think of the "wonderful" job
that browsers do in making sense out of hideously invalid HTML. Is there 
any reason to think that that level of creative hackery can't or won't
be applied to the challenge of making sense out of business messages
in XML, some of which will come from buggy software, some of which will be
human edited, some of which will come from organizations that support
newer versions of some spec than the receiver does, some will be 
generated by software that interprets the ambiguities in the spec differently
from the receiver, some of which will come from software that "embraces and
extends" the spec .... ad nauseum?  A "draconian" error handling policy 
just won't be any more viable than it would have been in Netscape 1.0.
[I  don't want to argue that any of this is a Good Thing ... just that it
is technically possible, and the business geeks will probably think it 


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS