Hi Folks,
I find myself in need of an XSLT parser for DTDs for a side project I'm playing with; since my background isn't really in computer science, I lack the formal education on parsers, so I hope someone here can tell me if my thoughts make sense, or point me at some resources to fill in the gaps!
In the short term, I want to write a DTD parser; longer term, I think it would be interesting to make a parser generator.
There is a parser generator out there already that can create XSLT parsers from EBNF grammars (http://www.bottlecaps.de/rex/ by Gunther Rademacher), which seems very good: Jirka Kosek and Steven Pemberton have talked about using it at Balisage and XML London, e.g. to check validity on non-XML fragments using schematron. I think there are two areas for improvement, however:
- The XSLT produced is a good example of functional programming; it uses functions that pass a state variable between them, etc. However I wonder if there isn't a lot opportunity to make use of the underlying declarative paradigm of XSLT, particularly with the new data structures and abilities of XSLT 3.
- Because of the above, it's harder to extend the parser, e.g. by inclusion in another XSLT. My use case for this is for a DTD parser which resolves entities.
As I understand it, most parsers include/depend upon a lexer, which tokenises the text, and then the parser builds the tokens into a hierarchy. It does this by building a transition table that states, for a given context, which tokens can be validly expected to occur within.It seems to me that a separate lexer and a transition table approach is trying to replicate functionality that's already in place in XSLT; what problems can people foresee with using modes and template matches rather than a transition table?Thanks,Tom