xml-dev - Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,

Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,

[ Lists Home | Date Index | Thread Index ]

To: "Eric van der Vlist" <vdv@dyomedea.com>
Subject: Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,
From: "Jonathan Borden" <jborden@attbi.com>
Date: Thu, 27 Jun 2002 08:33:51 -0400
Cc: "John Cowan" <jcowan@reutershealth.com>,"Thomas B. Passin" <tpassin@comcast.net>,"'xml-dev'" <xml-dev@lists.xml.org>
References: <200206270331.XAA07306@mail.reutershealth.com> <007401c21dce$0c1c1270$0201a8c0@ne.mediaone.net> <1025179636.2775.33.camel@ibook>

Eric van der Vlist wrote:
> On Thu, 2002-06-27 at 13:30, Jonathan Borden wrote:
>
> > Recognizing and processing natural language is something that's been
done
> > for a couple of decades -- albeit imperfectly -- and as I am sure you
are
> > aware, the grammar(s) are complicted -- what is generally needed is some
> > notion of the intended semantics of the sentences. In any case, this
example
> > isn't a good use case for XML schema languages and 'validity'.
>
> No, but it is a good use case for extensibility in XML schema languages.
>
> If you are happy with the result of the unix "file" command to determine
> the type of a text and see if it's more likely a Java source code, a
> snippet of Python or an English text, you may want to validate the
> document using its result instead of the code.

I presume that both Java and Python can be unambiguously determined via EBNF
or perhaps plain 'ol regular expressions, and that sort of endevour is a
good use case for schema extensibility -- err, though I was brought to
believe that the _whole point_ of XML is that such structural information
would be explicitly labelled. It's just that _reliable_ detection and
classification of human languages is a bit more difficult. It has been done
for a long long time (certain government agencies tend to spend unlimited
amounts of funds on such projects) and its problems are relatively well
characterized. As a _start_ in that direction take a look at _ontologies_
etc.

Jonathan

Follow-Ups:
- Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce:XMLSchema,
  - From: Eric van der Vlist <vdv@dyomedea.com>

References:
- Re: [xml-dev] Announce: XML Schema,
  - From: John Cowan <jcowan@reutershealth.com>
- English sentences, was: Re: [xml-dev] Announce: XML Schema,
  - From: "Jonathan Borden" <jborden@attbi.com>
- Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,
  - From: Eric van der Vlist <vdv@dyomedea.com>

Prev by Date: Re: [xml-dev] Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,
Next by Date: Re: [xml-dev] Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,
Previous by thread: Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,
Next by thread: Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce:XMLSchema,
Index(es):
- Date
- Thread