OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Frontline report from the Desperate Fgrep Hacker

[ Lists Home | Date Index | Thread Index ]
  • From: Sean McGrath <digitome@iol.ie>
  • To: xml-dev@xml.org
  • Date: Mon, 28 Feb 2000 13:50:40 +0000

At 07:45 11/02/00 -0500, you wrote:
>Sean McGrath scripsit:
>> [John Cowan]
>> >I happened to know that the element always appeared on a single
>> >line of the file: the start tag, the character data, the end tag.
>> How could you possibly know that?
>Because the documents were 100% generated by a single
>program whose behavior was entirely predictable.
I guessed as much.

The problem from an engineering
viewpoint is that we have no way
of expressing the syntactic assumptions
if the Fgrep programs for
those who will work with these systems
in the future.

Murphys law dictates that sooner or
later a perfectly sound XML document
will come along that blows my
Fgrep out of the water.

I think this is a very good example of
why something like XFM would be very useful.

I floated the idea of
XFM (XML Features Manifest) late last year
as a declarative way for creators of
XML content to formally specify what syntactic
constructs their content uses. i.e. external
entity references, internal document type
declaration subsets, marked sections and
so on.

I believe something like XFM would allow
the Desperate Fgrep Hacker to add a safety
net to her work by first checking the
XFM file for an XML corpus to see if
the assumptions present in the regular
expressions are justified.

In the mean time, I do not use FGrep
the way you use it. I first generate
an XML incarnation of ESIS I have dubbed
PYX using the xmln utility. This way,
I can be sure that I am grepping true
PCDATA, true element type names etc.

xmln is available on the Pyxie website,
if anyone wants to give it a whirl.
xmln is built on top of expat and
is non-validating. xmlv is built
on top of rxp and is validating.
Both xmln and xmlv generate the
same output notation called PYX.


http://www.pyxie.org - an Open Source XML Processing library for Python

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/threads.html


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS