XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
SHRVL: hierarchical, summarized SVRL may be useful for implyingstructures in horrid documents

SHRVL: Schematron Hierarchical Report View Language

It is possible to take an SVRL document and convert it into a hierarchical XML document. This can give us another weapon in our toolbelt for feature extraction and detecting implied structures.
 
See https://schematron.com/document/3472.html for the approach, for feature extraction, for what it is worth. I am wondering if this makes life easier, for some kinds of processing, because it unwraps the @location Xpaths and makes the report into more of a shape like the document (like a kind of reverse Examplotron?).

We collate the SVRL and transpose the @location XPaths into element names with position attributes.

An example mockup of a SHRVL-ed SVRL document made from running a Schematron report on a large set of large HTML documents follows:
<shrvl>
<html> <body pos="2"> <p pos="1" found="p-with-feasible-title" /> <div pos="2" found="div-with-small-simple-table"> <table pos="2" found="td-with-feasible-title"/> </div> <div pos="5" found="div-with-feasible-title" /> </body> </html>
</shrvl>

where the bold part is transposed from
<svrl:successful-report id="p2-r88-a88" role="p-with-potential-heading"
    location="/html/*[1][self::body]/*[1][self::p]" >
   <text>A p with the right words may be the main title</text>
</svrl:successful-report>

We might then give this document to our XSLT transformer, with the original document, so that its features can guide the transformation of the document. (For example, the XSLT could take the title from the first candidate found, except that a table in the second position trumps a starting p.) 

This may reduce the spaghetti coding, and provide a clear intermediate artifact that can be viewed to understand what is going on when some new document that fails the conversion is found.

It also can provide a way to do meta-grammars (architectural forms-ish): where you have a RELAX NG (or Schematron) schema to validate the SHVRL document.  This can act to let you know when your flat document has the features expected for some transformation, without having to actually do that heavyweight transformation.

Regards
Rick


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS