OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing pipeline, flow-based programming, grammars and parsing

Sometimes people distinguish between a parser, whose job is to say what parts of the grammar each part of the input belongs to, and a revogniser, whose job is to say whether the input conforms to the grammar.

Firewall yea/nay validation just needs a recognizer; contrast with XSD PSVI. (Schematron is more like a lot of parallel recognizers: the @role attribute is maybe the closest it gets to producing a parse. )

(Using grammars for validation was one of Charles Goldfarb and company's great innovations. But it is not the case that the Chomsky -style grammars are the only game in town for parsing. A few years ago I had a blog speculating on whether Zelig Harris' Operator Grammars could be used for XML validation. GOOGLE Zelig Harris Schematron)


On 03/11/2013 4:20 AM, "Costello, Roger L." <costello@mitre.org> wrote:

Hi Folks,


                Parsing is the process of structuring a

linear representation in accordance with

a given grammar. [Grune & Jacobs]


It just dawned on me that a "schema validator" is actually a parser.


An XML Schema is a grammar. A schema validator structures input in accordance with the XML Schema. Hey, that’s a parser!


A schema validator takes as input the output of another parser, the XML parser. An XML parser structures input in accordance with the XML grammar.


So there are two parsers that run, one following another:



A parsing pipeline!


That’s pretty neat. And it is in-line with Flow-Based Programming (FBP). (See Sean McGrath’s recent mention of FBP)


I recently started reading the bible of parsing:


                Parsing Techniques, A Practical Guide [Grune & Jacobs]


Reading it has made me realize that grammars are cool, so are parsers.


Here is a fantastic snippet from the book:


Parsing is the process of structuring a linear representation in accordance with a given grammar. This definition has been kept abstract on purpose to allow as wide an interpretation as possible. The “linear representation” may be a sentence, a computer program, a knitting pattern, a sequence of geological strata, a piece of music, actions of ritual behavior, in short any linear sequence in which the preceding elements in some way restrict the next element. For some of the examples the grammar is well known, for some it is an object of research, and for some our notion of a grammar is only just beginning to take shape.


For each grammar, there are generally an infinite number of linear representations (“sentences”) that can be structured with it. That is, a finite-sized grammar can supply structure to an infinite number of  sentences. This is the main strength of the grammar paradigm and indeed the main source of the importance of grammars: they summarize succinctly the structure of an infinite number of objects of a certain class.


There are several reasons to perform this structuring process called parsing. One reason derives from the fact that the obtained structure helps us to process the object further. When we know that a certain segment of a sentence is the subject, that information helps in understanding or translating the sentence. Once the structure of a document has been brought to the surface, it can be converted more easily.


A second reason is related to the fact that the grammar in a sense represents our understanding of the observed sentences: the better a grammar we can give for the movement of bees, the deeper our understanding of them. [Italics mine. I found this to be a fantastically profound statement.]


A third lies in the completion of missing information that parsers, and especially error-repairing parsers, can provide. Given a reasonable grammar of the language, an error-repairing parser can suggest possible word classes for missing or unknown works on clay tablets.



This makes me want to start writing my own grammar languages and my own parsers!




[Grune & Jacobs] http://www.amazon.com/Parsing-Techniques-Practical-Monographs-Computer/dp/1441919015/ref=sr_1_1?s=books&ie=UTF8&qid=1383331225&sr=1-1&keywords=parsing+techniques+a+practical+guide

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS