[
Lists Home |
Date Index |
Thread Index
]
- From: Paul Tchistopolskii <paul@qub.com>
- To: David Brownell <david-b@pacbell.net>,Eric van der Vlist <vdv@dyomedea.com>, xml-dev@lists.xml.org
- Date: Wed, 06 Dec 2000 09:56:43 -0800
> "Parser" is a word that's fuzzy around the edges, but it always
> involves processing according to some grammar. Textbooks
> about things like compilers will talk about such issues.
What particular textbook you are talking about? BTW -
do you know how YACC works ? I mean have you used
it ?
If you did, I think you should know that usually
/* comment */ is easier to return in one token.
Not returning *3* tokens. The lexer which returns
one 'comment' token instead of 3 tokens is still
lexer. You think returning one 'comment' token
instead of 3 tokens makes such lexer a parser ?
What textbook says it ? I'l be glad to read that
texbook.
> > In the existanse of yacc and lex - I think SAX API is a lexer.
> > It returns lexems. Tokens.
>
> A lexer returns tokens in order -- all of them. You'd see "%foo;" be
> reported, and never interpreted.
Yes, as I've said, SAX is crazy lexer. For some reason it has built-in
macroprocessor.
> In no way is SAX a lexical API;
> it provides syntactic interpretation ("start element" etc).
"start element" is of course token. And the name of the
element is yylval.
<any_yacc_document>
: Lexical Analysis
The user must supply a lexical analyzer to read the input
stream and communicate tokens (with values, if desired) to the
parser. The lexical analyzer is an integer-valued function
called yylex. The function returns an integer, the token number,
representing the kind of token read. If there is a value associ-
ated with that token, it should be assigned to the external vari-
able yylval.
</any_yacc_document>
> Parsing builds some higher level model out of token streams.
You call it 'higher level model' ? This means that something that
will work with the *real* grammal of XML application ( which is
schema or DTD ) will be a "sky rocket level".
> Something like YACC is irrelevant for XML, since the model
> inherited from SGML isn't well-enough factored; you can't
> use such tools, there are too many funky special cases.
Everything is relevant. I've wrote validaion parser which was
taking DTD on input and produced the appropriate yacc
*grammar* and then the grammar was used for XML
file valiadation.
It took me 2 days to write that validating parser in perl.
I was not covering all the cases, but this model could be
used in the real-life. I think it is funny you said me that I
can not use YACC with XML. I already did it more than one
year ago. Of course *not* any XML can be processed
with such a tool, just some subset. Significant subset,
though.
> For one XML example, turning "%foo;" into a stream of tokens
> (that fudges some nastiness, note!) in DTDs (a context defined
> by the parser) or passing it through unaltered (outside DTD)
> is done inside the parser.
Yes, lexer with build-in macroprocessor is crazy. I'm not sure
I understand your point.
> It's long been known that a SAX2 extension exposing lexical
> events could be defined ... but nobody's been motivated to
> work on one, so far as I know, since so few applications need
> to see that kind of data.
I should say that there is not too much applications. Also
I've already said that you can push the balancing into
"more of parser" as well as "more of lexer".
Many thanks for you answer, I think I've got all the information
I wanted.
Rgds.Paul.
|