[ANNOUNCE] Feature Grammars: a tool for feature extraction andrepresenta

Feature Grammars is a tool or technology for feature extraction and representation in XML.

It has a few novel (are they?) features:

* It allows grammar-like modeling of the hierarchical feature set combinations in the document, and the reporting as a feature tree (rather than a feature vector) that says "we looked for this feature because we found that feature"/

* It uses XPath for feature detection.

* It supports feature detection in multiple document types, with alternative detectors for each feature attaching to the same grammar.

It follows the same architecture as Schematron, and indeed could be used to preprocess documents for Schematron or to post-process the SVRL. However, it is not intended as schema language as such. The open-source proof-of-concept prototype implementation is at:

https://bitbucket.org/CharmOnset/featuregrammars

Feedback and improvements welcome!

It grew out of some years of thinking about remaining gaps with Schematron and XML query languages, and some experience with very large corpuses where dozens of different data sources (indeed, scores of sources, over time) fed very different documents to be converted to a common kitchen-sink transitional DTD: having a common schema made it look like it had been reduced to an N:1 problem, but this disguised that the documents were clustered into, in effect, different discrete languages depending on their source.

Regards

Rick