[
Lists Home |
Date Index |
Thread Index
]
- From: Robin Cover <robin@isogen.com>
- To: "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
- Date: Thu, 09 Nov 2000 05:03:05 -0600 (CST)
Michael's article as published in MLTP:
See:
http://www.oasis-open.org/cover/mltpTOC14.html
Sperberg-McQueen, C. Michael. "Regular expressions for dates." [SQUIB]
Markup Languages: Theory & Practice 1/4 (Fall 1999) 20-26. ISSN:
1099-6622 [MIT Press]. Author's affiliation: World Wide Web
Consortium/MIT Laboratory for Computer Science; Email: cmsmcq@acm.org.
"[One may hear:] 'Validation of dates, therefore, must (so goes the
argument) be left to application-specific code.' While I agree that
date checking is probably not best done using SGML content models, I
feel compelled to point out that the claim just paraphrased, as
stated, is false. It is possible to write a regular expression which
recognizes dates. At the Markup Technologies '98 conference, I
exhibited a deterministic finite-state automaton for recognizing
Gregorian dates in the form yyyy-mm-dd (as specified by ISO 8601),
which can be represented as lex code thus... The editors challenge our
readers to find shorter expressions for recognizing dates, which
accepts all valid dates, and only valid dates; in particular, the
expression should accept a 29th day in February only in leap years,
using the Gregorian rules for leap years. In particular, we are
interested in (a) the shortest regular expression and (b) the shortest
such expression which is unambiguous in the sense of SGML (or,
synonymously, deterministic in the sense of the XML
specification). For concreteness, we will specify that the expression
should use the syntax of lex , and need only accept four-digit
years. Variant date formats specified by ISO 8601 need not be
accepted. The shortest correct expressions received by the editors
before 1 July 2000 will be published in a later issue of this
journal. Judgement of a panel of peer reviewers as to the correctness
and length of the submissions is final."
Best wishes,
Robin Cover
------------------------------------------------------------------------------
On Thu, 9 Nov 2000, Henry S. Thompson wrote:
> Lars Marius Garshol <larsga@garshol.priv.no> writes:
>
> > I have recently come across a case where one wants to make an XML
> > Schema for the interchange syntax for an existing standard based on
> > relational databases.
> >
> > This existing standard already has a lexical representation for its
> > dates, of the form CCYYMMDD. The question is, how can one make an XML
> > Schema data type for dates that have this lexical representation?
> >
> > I see that XML Schema allows dates to have a pattern constraining
> > facet, the question is just how this will be interpreted by tools.
> > If the pattern were to be defined as
> >
> > '[12][0-9][0-9][0-9][01][0-9][0-3][0-9]'
> >
> > what would this mean? Would tools accept it?
>
> It means what you think, but it shouldn't be accepted. For better or
> worse, XML Schema v.1 operates a "sender must make right" approach to
> data representation: we define the lexical form, you must conform to
> it if you want the specified semantics. In the absence of
> micro-syntax<-->semantics mapping rules, this is the only way to
> guarantee applications will know what date is actually meant. This is
> why Rick Jelliffe says XML Schema puts the needs of data-orientated
> applications, which are presumed to care about semantics, ahead of the
> needs of document-orientated applications, which are presumed to just
> want to validate character sequences, and are only making semantic
> assertions for human-consumption/legacy-application-documentation, not
> for real interoperability.
>
> You can get the effect you desire with a type definition such as
>
> <xs:simpleType name="myDate">
> <xs:restriction base="xs:token">
> <xs:pattern value="[12][0-9][0-9][0-9](0[1-9]|1[0-2])([0-2][0-9]|3[01])"/>
> </xs:restriction>
> </xs:simpleType>
>
> which is of course still not quite right, because it allows
> e.g. 19990230. See Michael Sperberg-McQueen's article in Markup [ref
> not immediately available] for a regexp which does much better.
>
> ht
> --
> Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
> W3C Fellow 1999--2001, part-time member of W3C Team
> 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
> Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
> URL: http://www.ltg.ed.ac.uk/~ht/
>
|