Lists Home |
Date Index |
1) SGML. It allows you to specify regular expressions
(content models) together with the delimiters used,
to read in text, parse it to SGML, then output as events.
If you have many documents with lots of these, run them
through an SGML processor.
2) Check out Xpath2. It seems that it will have some kind of
syntax for this kind of thing, see
and the reference to captured substrings. I guess the
most consistant thing would be to follow them in some way.
3) Probably ISO DSDL will have something like this.
In particular, which features in addition to Regular Fragmentations
are you interested in?
4) In the meantime, you can tokenize many kinds of strings
and check them for various constraints using Schematron,
which can be embedded in <appinfo> now and extracted
using a stylesheet. That does not give you full regular
expressions. (Schematron 1.6 will be out within a month,
with <let> statements that help you do consecutive substring
capturing from strings, though this is not as powerful as
full regular expressions.) We have a free Windows tool that
supports embedded Schematron in XML Schemas.
5) If you are writing your own script, the embedded Schematron
XSLT scripts may be useful anyway: Francis Norton made some
really tricky code for extracting things from appinfo, and you
may find it useful to hack that code to generate, for example,
----- Original Message -----
From: "KRUMPOLEC Martin" <firstname.lastname@example.org>
Sent: Friday, March 14, 2003 3:25 AM
Subject: [xml-dev] text files & xsd & regex
> I would like ask if anyone seen something like this :
> - W3C XSL schema annotated (appinfo) with regular expressions
> - regexes consists of groups named after child (text only) elements
> - simple processor reads text file line by line and produces SAX "events"
> - this processor is driven by content model of our schema
> - streamed "infoset" matches the schema
> it is similar to "Regular Fragmentations" by Simon St.Laurent,
> just a little bit more complicated ...
> PS: if there is nothing like this I'll have to do it myself :-)
> Thank you
> Martin Krumpolec <email@example.com>
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>