[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Proposal: create a repository of reusable regular expressionsusing XML ENTITY declarations
- From: David Carlisle <davidc@nag.co.uk>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Mon, 22 Oct 2012 16:04:15 +0100
On 22/10/2012 15:39, Costello, Roger L. wrote:
> Hi Folks,
>
> Suppose that you wish to record the date and time of the launch of a new product. You could format the data using the xs:dateTime data type, but you desire slightly different format:
>
> <launch>
> <date-time>22 Oct 2012 07:52:00 -0004</date-time>
> </launch>
>
> The value of <date-time> can constrained to the desired format by using an XML Schema pattern facet. Recall that the value of a pattern facet is a regular expression (regex).
>
> Recently I learned from Michael Sperberg-McQueen a fantastic way of creating reusable regular expressions: create XML ENTITY declarations that express the regexes.
>
> Recall that an XML ENTITY declaration has a name followed by its replacement text:
>
> <!ENTITY name "replacement text">
>
> So, we could create a date-time ENTITY with a regex as its replacement text:
>
> <!ENTITY date-time "... regex ...">
>
> Okay, I did just that.
>
> I created a file containing the ENTITY declarations for the date-time format. I created the date-time format systematically through a series of ENTITY declarations:
>
> ---------------------------------------------------
> regex-repository.ent
> ---------------------------------------------------
> <!--
> *********************************
> Regex for date-time
>
> Here is an example string that
> conforms to the date-time regex:
> 22 Oct 2012 07:52:00 -0004
> *********************************
> -->
> <!ENTITY date-time "(&day-of-week;)?&date;&time;">
> <!ENTITY day-of-week "(&WSP;)?&day-name;">
> <!ENTITY day-name "(Mon|Tue|Wed|Thu|Fri|Sat|Sun)">
> <!ENTITY date "&day;&month;&year;">
> <!ENTITY day "(&WSP;)?[&DIGIT;]{1,2}&WSP;">
> <!ENTITY month "(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)">
> <!ENTITY year "&WSP;[&DIGIT;]{4,}&WSP;">
> <!ENTITY time "&time-of-day;&zone;">
> <!ENTITY time-of-day "&hour;&COLON;&minute;(&COLON;&second;)?">
> <!ENTITY hour "[&DIGIT;]{2,2}">
> <!ENTITY minute "[&DIGIT;]{2,2}">
> <!ENTITY second "[&DIGIT;]{2,2}">
> <!ENTITY zone "&WSP;[+-][&DIGIT;]{4,4}">
>
> <!ENTITY SP " "> <!-- Space -->
> <!ENTITY HTAB "	"> <!-- Horizontal tab -->
> <!ENTITY WSP "(&HTAB;|&SP;)"> <!-- Whitespace -->
>
> <!ENTITY COLON ":">
>
> <!ENTITY DIGIT "0-9">
>
> The regex for date-time can be reused by any XML Schema. More precisely, the regex can be reused by referencing, in a pattern facet, the date-time ENTITY declaration.
>
> Here I create an XML Schema for the <launch> element and use DOCTYPE to provide access to the regexes in regex-repository.ent. I use a pattern facet to constrain the value of the <date-time> element. The value of the pattern facet is the regex found by referencing the date-time ENTITY.
>
> ---------------------------------------------------
> launch.xsd
> ---------------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE xs:schema SYSTEM "regex-repository.ent"> <<---- Get access to the regexes here
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
>
> <xs:element name="launch">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="date-time" maxOccurs="unbounded">
> <xs:simpleType>
> <xs:restriction base="xs:string">
> <xs:pattern value="&date-time;"/> <<---- I use the regex here
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
>
> </xs:schema>
>
> Reusing a regex is simply a matter of referencing the ENTITY that holds it.
>
> I think it would be useful to create a repository of regular expressions using this XML ENTITY technique.
>
> Thoughts?
>
> /Roger
>
Of course if you just use a regex then you don't get any data
information passed to the application, it's just a string. This is the
classic SGML/XML view of attributes but not what a user of XSD might
possibly expect.
There is the proposed ISO/IEC extensible datatypes spec (mainly for use
with RelaxNG but could be used with other schema languages)
http://www.itscj.ipsj.or.jp/sc34/open/1130.pdf
That allows you not only to constrain the input syntax using regex but
also to express the underlying datatype fields
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]