OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] datatype functionality I'd like to see

[ Lists Home | Date Index | Thread Index ]

Hi Bry,

bry@itnisk.com wrote:

>I was just thinking about this, having switched over from coffee to tea this
>morning. It struck me as sort of weird. 
I can assure you, it's not weird. Many people, including myself, 
occasionally switch from coffee to tea and back : )

>Let us suppose that one had a datatype defined for social security number, this
>would essentially be a regular expression specifying how a SSN should be
so far so good. Fixing an example, let's say D is for digits, A is a 
regexp for some class letters, B is a regexp for other letters, and the 
simple SSN regexp was


>now a ssn has more information implicit in it than just the SSN. There is
>information that one could use to find out what state issued the number for
>example, in the prefix.
Maybe the DD-AA part, after the first dash.

>Let us suppose then that we had a definition for a structure which splits the
>SSN into two elements. The combined strings of both elements matches the
>datatype, this is something I would like to be able to specify. 
>Why not specify just a type that says a SSN type has two elements? 
>because in some formats that would be overkill. 
The classical example is the expiry date of a credit card, 31.12.9999 
(German notation) fitting in a couple of bits, compared to  
<date><day><month><year> madness. So people specify a regexp for the 
whole and leave the work of decomposing into days/months/years to 
application developers.

>Perhaps I am uninformed however, can anyone think of any particular schema
>language one can do this in, and if you are the person who knows of such a
>language can you give me an example if possible. (not that it's something I
>to do, just something I thought would be extremely useful to be able to do at
>some point)
I certainly do not know any schema language that deals with "segmented" 
regular expressions.

However, in research languages, regular pattern matching is getting 
quite popular. In Scala for instance, your SSN could be decomposed like this

def decomp(ssn:Seq[Char]):Pair[Seq[Char],Seq[Char]] = ssn match {
  case (part1 @ DD-AA)(part2 @ BDD-DDDD) => Pair(part1, part2)
  case _ => error("invalid ssn")

(this is sloppy syntax, of course D,A,B have to be real regular 
expressions, and Scala for instance does not have POSIX character 
classes, so it is very tedious to actually write, but you get the idea: 
the "variable @ regexp" construction binds whatever matched regexp to 
variable, in a much nicer way then the typical Perl / Java regexp madness)

Many others (Xduce, Cduce, XEN/C_omega) offer similar mechanisms, which 
work often also with sequences of XML nodes, or numbers, or whatever. In 
other words, this seems something of general use, beyond XML.

I once had the idea of adding this feature of pattern matching to a (yet 
another) type system for objects and XML(turning sequences of nodes in 
something like a record). Glad there seems to be use for it.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS