xml-dev - Re: XML parser using lex & yacc

Re: XML parser using lex & yacc

[ Lists Home | Date Index | Thread Index ]

From: Richard Tobin <richard@cogsci.ed.ac.uk>
To: "Alastair Sumner" <als2000@postmaster.co.uk>, xml-dev@ic.ac.uk
Date: Wed, 1 Sep 1999 16:47:51 +0100

> I want to develop an XML parser in C or maybe C++ for an
> undergraduate university project. My approach will be to prototype
> the parser using flex and bison. As I understand it, flex won't be
> able to handle all of the character encodings required in the the
> 1.0 spec.

Using your own lexer may be the best approach, but all the "syntax
characters" of XML are plain ASCII, so it might well be possible to
use [f]lex to tokenise it.  For UTF-8 it is straightforward: the lexer
doesn't have to even know that the multibyte-characters are not just
multiple characters - the next level up can translate them.

Or you might be able to replace the lexer's input functions and change
its character type to integer (if it isn't already); this would work
for UTF-16 (the other required encoding) too.

The most obvious problem with using yacc/lex type tools for XML is
that keywords aren't always keywords.  For example, in some places
in the DTD "SYSTEM" is a keyword and in others it would just be
a name.  You can have the parser switch the lexer between states
but it's not pretty.

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: XML parser using lex & yacc
  - From: "Richard L. Goerwitz" <richard@goon.stg.brown.edu>

Prev by Date: RE: Why namespaces?
Next by Date: No Subject
Previous by thread: XML parser using lex & yacc
Next by thread: Re: XML parser using lex & yacc
Index(es):
- Date
- Thread