xml-dev - Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,

Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,

[ Lists Home | Date Index | Thread Index ]

To: jborden@attbi.com (Jonathan Borden)
Subject: Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,
From: John Cowan <jcowan@reutershealth.com>
Date: Thu, 27 Jun 2002 08:06:15 -0400 (EDT)
Cc: jcowan@reutershealth.com (John Cowan), tpassin@comcast.net (Thomas B. Passin), xml-dev@lists.xml.org ('xml-dev')
In-reply-to: <007401c21dce$0c1c1270$0201a8c0@ne.mediaone.net> from "Jonathan Borden" at Jun 27, 2002 07:30:32 AM

Jonathan Borden scripsit:

> It all depends on what exactly you want, or intend the validator to do. What
> you are saying, in essense, is that an "English sentence" is not defined as
> a sequence of characters which conform to "text-en" and this is most true.

The original point seems to have gotten lost.

The publisher's use case was for a datatype representing those letters,
and only those letters, used in writing the Dutch language.  Formally, of
course, that's easy: it's an xsd:string type with a pattern facet
consisting of "[ a-zA-Z...]+".  The question is, just what are those
other letters represented by the ellipsis in any given case?

I used the examples of "façade" and "coöperate" and "naïve" to
illustrate that this problem may or may not have a clear-cut answer.  These
are not foreign words; they are standard spellings (though not the only
standard spellings) of standard English words.

It's perfectly true that a sentence like "Al-Musa said, '<insert
Arabic here>'." is also an English sentence even if the Arabic text
is expressed in the Arabic script.  But that isn't my point.

> Indeed to reliably detect an English sentence the 'recognizer' needs to
> understand how to form words from characters and sentences from words. This
> is way outside the capabilities of the XML schema definition languages we
> have been discussing.

Of course, of course.  But even at the level of characters, there is
a *definitional* (not implementation) problem in saying just what
the character repertoire of <insert language here> is.
Many have come up against this rock and crashed against it.

-- 
John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_

Follow-Ups:
- Re: [xml-dev] Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,
  - From: "Jonathan Borden" <jborden@attbi.com>

References:
- English sentences, was: Re: [xml-dev] Announce: XML Schema,
  - From: "Jonathan Borden" <jborden@attbi.com>

Prev by Date: English sentences, was: Re: [xml-dev] Announce: XML Schema,
Next by Date: Re: [xml-dev] English sentences, was: Re: [xml-dev] Announce: XMLSchema,
Previous by thread: English sentences, was: Re: [xml-dev] Announce: XML Schema,
Next by thread: Re: [xml-dev] Re: English sentences, was: Re: [xml-dev] Announce: XML Schema,
Index(es):
- Date
- Thread