XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Proposal: create a repository of reusable regular expressionsusing XML ENTITY declarations

On 22/10/2012 15:39, Costello, Roger L. wrote:
> Hi Folks,
>
> Suppose that you wish to record the date and time of the launch of a new product. You could format the data using the xs:dateTime data type, but you desire slightly different format:
>
> <launch>
>     <date-time>22 Oct 2012 07:52:00 -0004</date-time>
> </launch>
>
> The value of <date-time> can constrained to the desired format by using an XML Schema pattern facet. Recall that the value of a pattern facet is a regular expression (regex).
>
> Recently I learned from Michael Sperberg-McQueen a fantastic way of creating reusable regular expressions: create XML ENTITY declarations that express the regexes.
>
> Recall that an XML ENTITY declaration has a name followed by its replacement text:
>
> <!ENTITY name "replacement text">
>
> So, we could create a date-time ENTITY with a regex as its replacement text:
>
> <!ENTITY date-time "... regex ...">
>
> Okay, I did just that.
>
> I created a file containing the ENTITY declarations for the date-time format. I created the date-time format systematically through a series of ENTITY declarations:
>
> ---------------------------------------------------
>             	regex-repository.ent
> ---------------------------------------------------
> <!--
>       *********************************
>           Regex for date-time
>
>       Here is an example string that
>       conforms to the date-time regex:
>       22 Oct 2012 07:52:00 -0004
>       *********************************
>   -->
> <!ENTITY date-time      	"(&day-of-week;)?&date;&time;">
> <!ENTITY day-of-week 	"(&WSP;)?&day-name;">
> <!ENTITY day-name      	"(Mon|Tue|Wed|Thu|Fri|Sat|Sun)">
> <!ENTITY date                	"&day;&month;&year;">
> <!ENTITY day                 	"(&WSP;)?[&DIGIT;]{1,2}&WSP;">
> <!ENTITY month            	"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)">
> <!ENTITY year                	"&WSP;[&DIGIT;]{4,}&WSP;">
> <!ENTITY time                	"&time-of-day;&zone;">
> <!ENTITY time-of-day  	"&hour;&COLON;&minute;(&COLON;&second;)?">
> <!ENTITY hour               	"[&DIGIT;]{2,2}">
> <!ENTITY minute           	"[&DIGIT;]{2,2}">
> <!ENTITY second           	"[&DIGIT;]{2,2}">
> <!ENTITY zone               	"&WSP;[+-][&DIGIT;]{4,4}">
>
> <!ENTITY SP             	"&#32;">            	<!-- Space -->
> <!ENTITY HTAB           	"&#9;">             		<!-- Horizontal tab -->
> <!ENTITY WSP            	"(&HTAB;|&SP;)">    	<!-- Whitespace -->
>
> <!ENTITY COLON 	"&#58;">
>
> <!ENTITY DIGIT 	"0-9">
>
> The regex for date-time can be reused by any XML Schema. More precisely, the regex can be reused by referencing, in a pattern facet, the date-time ENTITY declaration.
>
> Here I create an XML Schema for the <launch> element and use DOCTYPE to provide access to the regexes in regex-repository.ent. I use a pattern facet to constrain the value of the <date-time> element. The value of the pattern facet is the regex found by referencing the date-time ENTITY.
>
> ---------------------------------------------------
>             	       launch.xsd
> ---------------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE xs:schema SYSTEM "regex-repository.ent">  <<---- Get access to the regexes here
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>
>      <xs:element name="launch">
>          <xs:complexType>
>              <xs:sequence>
>                  <xs:element name="date-time" maxOccurs="unbounded">
>                      <xs:simpleType>
>                          <xs:restriction base="xs:string">
>                              <xs:pattern value="&date-time;"/>   <<---- I use the regex here
>                          </xs:restriction>
>                      </xs:simpleType>
>                  </xs:element>
>              </xs:sequence>
>          </xs:complexType>
>      </xs:element>
>
> </xs:schema>
>
> Reusing a regex is simply a matter of referencing the ENTITY that holds it.
>
> I think it would be useful to create a repository of regular expressions using this XML ENTITY technique.
>
> Thoughts?
>
> /Roger
>

Of course if you just use a regex then you don't get any data 
information passed to the application, it's just a string. This is the 
classic SGML/XML view of attributes but not what a user of XSD might 
possibly expect.

There is the proposed ISO/IEC  extensible datatypes spec (mainly for use 
with RelaxNG but could be used with other schema languages)

http://www.itscj.ipsj.or.jp/sc34/open/1130.pdf

That allows you not only to constrain the input syntax using regex but 
also to express the underlying datatype fields

David








________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS