OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Relax NG annoyances

[ Lists Home | Date Index | Thread Index ]

Hi Jeni,

Jeni Tennison wrote:
> Robin Berjon wrote:
>>A fair number of vocabularies created before XML Schema or RelaxNG
>>have comma or semicolon separated lists. Another example could be
>>the list of commands in SVG path data. But as tempting as it is to
>>want to fix this with lists, I think that having a nice way of
>>declaring compound types (à la Regular Fragmentation, but without
>>changing the tree) would be the most general and elegant solution to
>>this.
> 
> At his RELAX NG tutorial at XML 2002, John Cowan mentioned the
> possibility of extending RELAX NG patterns into text content, so, for
> example, to get pairs of numbers in which the numbers in a pair were
> separated by commas and the pairs were separated by whitespace, you
> might use something like:
> 
> <define name="path">
>   <ref name="numberPair" />
>   <zeroOrMore>
>     <whitespace />
>     <ref name="numberPair" />
>   </zeroOrMore>
> </define>
> 
> <define name="numberPair">
>   <data type="decimal" />
>   <value>,</value>
>   <data type="decimal" />
> </define>
> 
> I have no idea whether this is an idea that's being pursued?

I've been working on something similar myself, on and off. I think that Simon 
and Eric's work on RegFrags[0] can be considered to have similar goals as well.

What I like about RegFrags is that they make implicit structure explicit by 
adding to the XML tree. That's convenient because it means that all downstream 
processors need to deal with is XML. What I dislike with them is the exact same 
thing, since sometimes I don't want my tree to be touched.

I think it's unrealistic to believe that people will create vocabularies where 
all structure is to be made explicit. XPath, SVG path data, CSS values... 
examples abound.

> The argument that there should be a separate way of defining datatype
> libraries, with RELAX NG schemas (and other technologies) just
> referencing an appropriate one, seems persuasive.

Very much so.

> A combination of the
> datatype-oriented definitions ala XML Schema and regex-based
> definitions, like the one above, seems pretty powerful. Presumably
> this is something that Part 5 (Datatypes) of DSDL is addressing?

I haven't had time to follow DSDL as much as I would like to, but combining 
typing and regexen -- if only for composability -- seems to me to be a powerful 
mix. What I have been playing with is conversions from WXS types to regexen 
which are composed into into a single large regex which is then use to type 
subcomponents of a string. One advantage of this approach is that you can let 
the regex engine backtrack accross types, it makes life easier. A disadvantage 
is that it's possible to do silly things, such as (using the RNG syntax from above):

   <define name='prefixedInt'>
     <data type='string'/>
     <data type='int'/>
   </define>

which will be unlikely to do what the author wants unless the int is constrained 
to be in the 0-9 range or the string has a proper pattern ("foo1234" will yield 
{"foo123",4}, not {"foo", 1234}).


[0]http://www.simonstl.com/projects/fragment/

-- 
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway        http://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS