[
Lists Home |
Date Index |
Thread Index
]
Amelia A Lewis <amyzing@talsever.com> wrote:
| Building a lexical API on top of a syntactic one is ... backwards.
Yep. SAX, e.g., is based on ESIS, which is a syntactic API spec.
| It is perfectly easy to imagine, for instance, LAX: the lexical API for
| XML. This would have different sorts of events, though. Perhaps it
| would have "leftPointyBracket()" and "nameCharacters(char [])" and
| "tagWhitespace(char [])" and "attributeValue(char, char [])".
Well, both SGML and XML have lexical specifications (e.g. the ISO 8879
productions http://www.oreilly.com/people/staff/crism/sgmldefs.html and
the productions in the XML spec document). SGML actually defines things
in terms of an _abstract syntax_. For instance, a starttag begins with a
STAGO and ends (usually) with a TAGC, in the meanwhile picking up stuff
like names, VI (value indicator), LIT, LITA and the like. (The delimiters
are bound to a _concrete syntax_ in the SGML declaration; that's how "<"
is STAGO, "=" is VI, ">", etc. XML disallows variant concrete syntaxes,
instead fixing the syntax to the bindings of the _Reference Concrete
Syntax_.) So, it's possible to associate categories with token "events"
and define an API at that level: tokenization only.
| I don't know if an in-memory API corresponding to such a ... lax parse
| (oh, re ... lax. You knew that was coming, right?) is possible,
| though.
A push API shouldn't be too difficult. By in-memory do you mean some
analogue of DOM, where all the tokens are held in a structure of some sort
(like a parse tree)?
|