OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

atoms, molecules



Every time I've read XML Schema Part 2: Datatypes, I've been unhappy with 
the wide variety of compound types that are considered 'primitives' by the 
specification.  Leaving aside the issue of primitive types that could be 
derived from other types, we've still got compounds like:

* duration
* dateTime
* time
* date
* gYearMonth
* gMonthDay
* QName

There's been prior discussion of internationalization (i18n) problems with 
the date and time formats, and I think it's fair to say that dates and 
times are the most contentious area on the data typing side.

I'm wondering if there's a better way to handle these things, especially in 
contexts (RELAX, TREX, Schematron, Examplotron, no schema at all) where we 
aren't necessarily using XML Schema anyway.

It seems as if regular expressions could be used not just for validation of 
typed content, but for fragmentation of typed molecules into smaller 
atoms.  Instead of binding users to a particular (ISO 8601) date format, 
this approach would let users provide their own rules for fragmenting date 
strings into the parts we need for processing - year, month, day, etc.

It would also open up the prospect of treating other compounds - like the 
CSS style attribute, some of the path information in SVG, and various other 
places where the principle of one chunk, one string has been violated - as 
a set of atoms which could themselves be validated and/or transformed 
and/or typed.

This leads to another kinds of post-processing infoset, where the atoms are 
available as an ordered set of child nodes, but it seems like a promising road.

Simon St.Laurent - Associate Editor, O'Reilly and Associates
XML Elements of Style / XML: A Primer, 2nd Ed.
XHTML: Migrating Toward XML
http://www.simonstl.com - XML essays and books