OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Proposal: create a repository of reusable regular expressions usingXML ENTITY declarations

Hi Folks,

Suppose that you wish to record the date and time of the launch of a new product. You could format the data using the xs:dateTime data type, but you desire slightly different format:

   <date-time>22 Oct 2012 07:52:00 -0004</date-time>

The value of <date-time> can constrained to the desired format by using an XML Schema pattern facet. Recall that the value of a pattern facet is a regular expression (regex).

Recently I learned from Michael Sperberg-McQueen a fantastic way of creating reusable regular expressions: create XML ENTITY declarations that express the regexes. 

Recall that an XML ENTITY declaration has a name followed by its replacement text:

<!ENTITY name "replacement text">

So, we could create a date-time ENTITY with a regex as its replacement text:

<!ENTITY date-time "... regex ...">

Okay, I did just that.

I created a file containing the ENTITY declarations for the date-time format. I created the date-time format systematically through a series of ENTITY declarations:

         Regex for date-time
     Here is an example string that 
     conforms to the date-time regex:
     22 Oct 2012 07:52:00 -0004
<!ENTITY date-time      	"(&day-of-week;)?&date;&time;">         
<!ENTITY day-of-week 	"(&WSP;)?&day-name;">                          
<!ENTITY day-name      	"(Mon|Tue|Wed|Thu|Fri|Sat|Sun)">                  
<!ENTITY date                	"&day;&month;&year;">                                 
<!ENTITY day                 	"(&WSP;)?[&DIGIT;]{1,2}&WSP;">                         
<!ENTITY month            	"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)">   
<!ENTITY year                	"&WSP;[&DIGIT;]{4,}&WSP;">                            
<!ENTITY time                	"&time-of-day;&zone;">                               
<!ENTITY time-of-day  	"&hour;&COLON;&minute;(&COLON;&second;)?">     
<!ENTITY hour               	"[&DIGIT;]{2,2}">                                     
<!ENTITY minute           	"[&DIGIT;]{2,2}">                                   
<!ENTITY second           	"[&DIGIT;]{2,2}">                                   
<!ENTITY zone               	"&WSP;[+-][&DIGIT;]{4,4}">

<!ENTITY SP             	"&#32;">            	<!-- Space -->
<!ENTITY HTAB           	"&#9;">             		<!-- Horizontal tab -->
<!ENTITY WSP            	"(&HTAB;|&SP;)">    	<!-- Whitespace -->

<!ENTITY COLON 	"&#58;">

<!ENTITY DIGIT 	"0-9">

The regex for date-time can be reused by any XML Schema. More precisely, the regex can be reused by referencing, in a pattern facet, the date-time ENTITY declaration.

Here I create an XML Schema for the <launch> element and use DOCTYPE to provide access to the regexes in regex-repository.ent. I use a pattern facet to constrain the value of the <date-time> element. The value of the pattern facet is the regex found by referencing the date-time ENTITY.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xs:schema SYSTEM "regex-repository.ent">  <<---- Get access to the regexes here
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>

    <xs:element name="launch">
                <xs:element name="date-time" maxOccurs="unbounded">
                        <xs:restriction base="xs:string">
                            <xs:pattern value="&date-time;"/>   <<---- I use the regex here


Reusing a regex is simply a matter of referencing the ENTITY that holds it.

I think it would be useful to create a repository of regular expressions using this XML ENTITY technique.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS