Apatak is a worked-out idea for a validator for RAN (Raw Access Notation), based on fuzzy, composable, pairwise checks with a very trivial implementation. Like RAN, this is to explore what technologies might look like that takes advantage of our 21st Century parallelized hardware and software capabilities.
RAN is designed to allow various kinds of parallelization. One is to scan the document into fragments and process each one separately in parallel: each fragment could be validated as if it were a document using conventional grammars.
But RAN also allows random-access lexing: starting from any location the lexer can find the next start or end- tag, and start producing output from there. But how can this be validated, since we do not necessarily know what the current ancestors or previous siblings are?
Apatak allows the validation of an arbitrary section of a document (with complete tags, though not necessarily all balanced in the fragment, for example the situation where a RAN lexer runs on a section of text and produces a SAX-like stream.)
Some document types and structure errors will be highly suited to Apatak. It is designed to err on the side of false positives. Grammars can be converted to Apatak with some loss of contextual power on elements or attributes that can appear in multiple contexts with different rules: XML DTDs with much identical mixed content will not be much affected, however DTDs, XSD, RELAX NG with equivocal usages will have more losses.
Regards
Rick