OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] text files & xsd & regex

[ Lists Home | Date Index | Thread Index ]

Two answers:

1) SGML. It allows you to specify regular expressions
(content models) together with the delimiters used,
to read in text, parse it to SGML, then output as events.
If you have many documents with lots of these, run them
through an SGML processor.  

2) Check out Xpath2. It seems that it will have some kind of
syntax for this kind of thing, see 
http://www.w3.org/TR/xquery-operators/#func-matches
and the reference to captured substrings.   I guess the
most consistant thing would be to follow them in some way.

3) Probably ISO DSDL will have something like this.
In particular, which features in addition to Regular Fragmentations
are you interested in?

4) In the meantime, you can tokenize many kinds of strings 
and check them for various constraints using Schematron,
which can be embedded in <appinfo> now and extracted
using a stylesheet. That does not give you full regular 
expressions.  (Schematron 1.6 will be out within a month,
with <let> statements that help you do consecutive substring
capturing from strings, though this is not as powerful as
full regular expressions.)  We have a free Windows tool that
supports embedded Schematron in XML Schemas.

5) If you are writing your own script, the embedded Schematron
XSLT scripts may be useful anyway: Francis Norton made some
really tricky code for extracting things from appinfo, and you 
may find it useful to hack that code to generate, for example,
Perl scripts.

Cheers
Rick Jelliffe
http://www.topologi.com/

----- Original Message ----- 
From: "KRUMPOLEC Martin" <krumpolec@asset.sk>
To: <xml-dev@lists.xml.org>
Sent: Friday, March 14, 2003 3:25 AM
Subject: [xml-dev] text files & xsd & regex


> Hi,
> 
>   I would like ask if anyone seen something like this :
> 
>   - W3C XSL schema annotated (appinfo) with regular expressions
>   - regexes consists of groups named after child (text only) elements
>   - simple processor reads text file line by line and produces SAX "events"
>   - this processor is driven by content model of our schema
>   - streamed "infoset" matches the schema
> 
>   it is similar to "Regular Fragmentations" by Simon St.Laurent,
>   just a little bit more complicated ...
> 
>   PS: if there is nothing like this I'll have to do it myself :-)
>   
> Thank you
> 
> Martin
> 
> -- 
> Martin Krumpolec <krumpo@pobox.sk>
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
> 
>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS