Lists Home |
Date Index |
John Cowan wrote:
Jonathan Borden scripsit:
> > Sure, but that doesn't make fašade *wrong*. Are you going to say that
> > <text>His attitude is a mere fašade</text> must not validate against
> > a database library with a "text-en" datatype?
> I suppose that if I went to the trouble to specify "text-en" that I
> wouldn't want that to validate.
But why not? It is unquestionably an English sentence, even if one of
the words in it has an unusual orthography.
It all depends on what exactly you want, or intend the validator to do. What
you are saying, in essense, is that an "English sentence" is not defined as
a sequence of characters which conform to "text-en" and this is most true.
Indeed to reliably detect an English sentence the 'recognizer' needs to
understand how to form words from characters and sentences from words. This
is way outside the capabilities of the XML schema definition languages we
have been discussing.
Recognizing and processing natural language is something that's been done
for a couple of decades -- albeit imperfectly -- and as I am sure you are
aware, the grammar(s) are complicted -- what is generally needed is some
notion of the intended semantics of the sentences. In any case, this example
isn't a good use case for XML schema languages and 'validity'.