Lists Home |
Date Index |
Eric van der Vlist wrote:
> On Thu, 2002-06-27 at 13:30, Jonathan Borden wrote:
> > Recognizing and processing natural language is something that's been
> > for a couple of decades -- albeit imperfectly -- and as I am sure you
> > aware, the grammar(s) are complicted -- what is generally needed is some
> > notion of the intended semantics of the sentences. In any case, this
> > isn't a good use case for XML schema languages and 'validity'.
> No, but it is a good use case for extensibility in XML schema languages.
> If you are happy with the result of the unix "file" command to determine
> the type of a text and see if it's more likely a Java source code, a
> snippet of Python or an English text, you may want to validate the
> document using its result instead of the code.
I presume that both Java and Python can be unambiguously determined via EBNF
or perhaps plain 'ol regular expressions, and that sort of endevour is a
good use case for schema extensibility -- err, though I was brought to
believe that the _whole point_ of XML is that such structural information
would be explicitly labelled. It's just that _reliable_ detection and
classification of human languages is a bit more difficult. It has been done
for a long long time (certain government agencies tend to spend unlimited
amounts of funds on such projects) and its problems are relatively well
characterized. As a _start_ in that direction take a look at _ontologies_