OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: atoms, molecules

"Simon St.Laurent" <simonstl@simonstl.com> writes:

> Every time I've read XML Schema Part 2: Datatypes, I've been unhappy with 
> the wide variety of compound types that are considered 'primitives' by the 
> specification.  Leaving aside the issue of primitive types that could be 
> derived from other types, we've still got compounds like:
> * duration
> * dateTime
> * time
> * date
> * gYearMonth
> * gMonthDay
> * QName
> There's been prior discussion of internationalization (i18n) problems with 
> the date and time formats, and I think it's fair to say that dates and 
> times are the most contentious area on the data typing side.
> I'm wondering if there's a better way to handle these things, especially in 
> contexts (RELAX, TREX, Schematron, Examplotron, no schema at all) where we 
> aren't necessarily using XML Schema anyway.
> It seems as if regular expressions could be used not just for validation of 
> typed content, but for fragmentation of typed molecules into smaller 
> atoms.  Instead of binding users to a particular (ISO 8601) date format, 
> this approach would let users provide their own rules for fragmenting date 
> strings into the parts we need for processing - year, month, day, etc.
> It would also open up the prospect of treating other compounds - like the 
> CSS style attribute, some of the path information in SVG, and various other 
> places where the principle of one chunk, one string has been violated - as 
> a set of atoms which could themselves be validated and/or transformed 
> and/or typed.
> This leads to another kinds of post-processing infoset, where the atoms are 
> available as an ordered set of child nodes, but it seems like a promising road.

Strong disagreement (speaking personally).  We have a way in XML to
express compound objects -- it's called elements-and-attributes.  The
mistake, in my opinion, was giving in to the SQL people and having
_any_ kind of date or time as simple types -- they should _all_ have
gone in to the type library as complex types.

  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/