[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Proposal: create a repository of reusable regular expressions usingXML ENTITY declarations
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Mon, 22 Oct 2012 14:39:16 +0000
Hi Folks,
Suppose that you wish to record the date and time of the launch of a new product. You could format the data using the xs:dateTime data type, but you desire slightly different format:
<launch>
<date-time>22 Oct 2012 07:52:00 -0004</date-time>
</launch>
The value of <date-time> can constrained to the desired format by using an XML Schema pattern facet. Recall that the value of a pattern facet is a regular expression (regex).
Recently I learned from Michael Sperberg-McQueen a fantastic way of creating reusable regular expressions: create XML ENTITY declarations that express the regexes.
Recall that an XML ENTITY declaration has a name followed by its replacement text:
<!ENTITY name "replacement text">
So, we could create a date-time ENTITY with a regex as its replacement text:
<!ENTITY date-time "... regex ...">
Okay, I did just that.
I created a file containing the ENTITY declarations for the date-time format. I created the date-time format systematically through a series of ENTITY declarations:
---------------------------------------------------
regex-repository.ent
---------------------------------------------------
<!--
*********************************
Regex for date-time
Here is an example string that
conforms to the date-time regex:
22 Oct 2012 07:52:00 -0004
*********************************
-->
<!ENTITY date-time "(&day-of-week;)?&date;&time;">
<!ENTITY day-of-week "(&WSP;)?&day-name;">
<!ENTITY day-name "(Mon|Tue|Wed|Thu|Fri|Sat|Sun)">
<!ENTITY date "&day;&month;&year;">
<!ENTITY day "(&WSP;)?[&DIGIT;]{1,2}&WSP;">
<!ENTITY month "(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)">
<!ENTITY year "&WSP;[&DIGIT;]{4,}&WSP;">
<!ENTITY time "&time-of-day;&zone;">
<!ENTITY time-of-day "&hour;&COLON;&minute;(&COLON;&second;)?">
<!ENTITY hour "[&DIGIT;]{2,2}">
<!ENTITY minute "[&DIGIT;]{2,2}">
<!ENTITY second "[&DIGIT;]{2,2}">
<!ENTITY zone "&WSP;[+-][&DIGIT;]{4,4}">
<!ENTITY SP " "> <!-- Space -->
<!ENTITY HTAB "	"> <!-- Horizontal tab -->
<!ENTITY WSP "(&HTAB;|&SP;)"> <!-- Whitespace -->
<!ENTITY COLON ":">
<!ENTITY DIGIT "0-9">
The regex for date-time can be reused by any XML Schema. More precisely, the regex can be reused by referencing, in a pattern facet, the date-time ENTITY declaration.
Here I create an XML Schema for the <launch> element and use DOCTYPE to provide access to the regexes in regex-repository.ent. I use a pattern facet to constrain the value of the <date-time> element. The value of the pattern facet is the regex found by referencing the date-time ENTITY.
---------------------------------------------------
launch.xsd
---------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xs:schema SYSTEM "regex-repository.ent"> <<---- Get access to the regexes here
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="launch">
<xs:complexType>
<xs:sequence>
<xs:element name="date-time" maxOccurs="unbounded">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="&date-time;"/> <<---- I use the regex here
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Reusing a regex is simply a matter of referencing the ENTITY that holds it.
I think it would be useful to create a repository of regular expressions using this XML ENTITY technique.
Thoughts?
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]