[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
atoms, molecules
- From: "Simon St.Laurent" <simonstl@simonstl.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 17 Apr 2001 12:21:49 -0400
Every time I've read XML Schema Part 2: Datatypes, I've been unhappy with
the wide variety of compound types that are considered 'primitives' by the
specification. Leaving aside the issue of primitive types that could be
derived from other types, we've still got compounds like:
* duration
* dateTime
* time
* date
* gYearMonth
* gMonthDay
* QName
There's been prior discussion of internationalization (i18n) problems with
the date and time formats, and I think it's fair to say that dates and
times are the most contentious area on the data typing side.
I'm wondering if there's a better way to handle these things, especially in
contexts (RELAX, TREX, Schematron, Examplotron, no schema at all) where we
aren't necessarily using XML Schema anyway.
It seems as if regular expressions could be used not just for validation of
typed content, but for fragmentation of typed molecules into smaller
atoms. Instead of binding users to a particular (ISO 8601) date format,
this approach would let users provide their own rules for fragmenting date
strings into the parts we need for processing - year, month, day, etc.
It would also open up the prospect of treating other compounds - like the
CSS style attribute, some of the path information in SVG, and various other
places where the principle of one chunk, one string has been violated - as
a set of atoms which could themselves be validated and/or transformed
and/or typed.
This leads to another kinds of post-processing infoset, where the atoms are
available as an ordered set of child nodes, but it seems like a promising road.
Simon St.Laurent - Associate Editor, O'Reilly and Associates
XML Elements of Style / XML: A Primer, 2nd Ed.
XHTML: Migrating Toward XML
http://www.simonstl.com - XML essays and books