OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: (more) extensible SAX

[ Lists Home | Date Index | Thread Index ]
  • From: Paul Tchistopolskii <paul@qub.com>
  • To: David Brownell <david-b@pacbell.net>,Eric van der Vlist <vdv@dyomedea.com>, xml-dev@lists.xml.org
  • Date: Wed, 06 Dec 2000 09:56:43 -0800

 
> "Parser" is a word that's fuzzy around the edges, but it always
> involves processing according to some grammar.  Textbooks
> about things like compilers will talk about such issues.

What particular textbook you are talking about? BTW - 
do you know how YACC works ?  I mean have you used 
it ?

If you did, I think you should know that usually  
/* comment */ is easier to return in one token.
Not returning *3* tokens. The lexer which returns 
one 'comment' token instead of 3 tokens is still 
lexer. You think returning one 'comment' token 
instead of 3 tokens makes such lexer a parser ?

What textbook says it ?  I'l be glad to read that 
texbook.

> > In the existanse of yacc and lex -  I think SAX API is a lexer. 
> > It returns lexems. Tokens. 
> 
> A lexer returns tokens in order -- all of them.  You'd see "%foo;" be
> reported, and never interpreted.  

Yes, as I've said, SAX is crazy lexer.  For some reason it has built-in 
macroprocessor. 

> In no way is SAX a lexical API;
> it provides syntactic interpretation ("start element" etc).

"start element" is of course token. And the name of the 
element is yylval. 
 
<any_yacc_document>

: Lexical Analysis

     The user must supply a lexical analyzer to  read  the  input
stream  and  communicate  tokens (with values, if desired) to the
parser.  The  lexical  analyzer  is  an  integer-valued  function
called yylex.  The function returns an integer, the token number,
representing the kind of token read.  If there is a value associ-
ated with that token, it should be assigned to the external vari-
able yylval.

</any_yacc_document>

> Parsing builds some higher level model out of token streams.

You call it 'higher level model' ? This means that something that 
will work with the *real* grammal of XML application ( which is 
schema or DTD ) will be a "sky rocket level". 

> Something like YACC is irrelevant for XML, since the model
> inherited from SGML isn't well-enough factored; you can't
> use such tools, there are too many funky special cases.

Everything is relevant. I've wrote validaion parser which was 
taking DTD on input and produced the appropriate yacc 
*grammar* and then the grammar was used for XML 
file valiadation. 

It took me 2 days to write that validating parser in perl. 
I was not covering all the cases, but this model could be 
used in the real-life. I think it is funny you said me that I
can not use YACC with XML. I already did it more than one 
year ago. Of course *not* any XML can be processed 
with  such a tool, just some subset. Significant subset, 
though.
 
> For one XML example, turning "%foo;" into a stream of tokens
> (that fudges some nastiness, note!) in DTDs (a context defined
> by the parser) or passing it through unaltered (outside DTD)
> is done inside the parser. 

Yes, lexer with build-in macroprocessor is crazy. I'm not sure 
I understand your point.

> It's long been known that a SAX2 extension exposing lexical
> events could be defined ... but nobody's been motivated to
> work on one, so far as I know, since so few applications need
> to see that kind of data.

I should say that there is not too much applications. Also
I've already said that you can push the balancing into 
"more of parser" as well as "more of lexer".

Many thanks for you answer, I think I've got all the information
I wanted. 

Rgds.Paul.






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS