Lists Home |
Date Index |
- From: email@example.com (Christopher R. Maden)
- To: firstname.lastname@example.org
- Date: Thu, 25 Nov 1999 00:41:49 -0800
>I really don't want to parse SGML fully, as it's not needed for an
>SGML->XML converter. I just want to assume concrete syntax, leave the
>entities alone, and the more esoteric SGML documents I might encounter
>I'll fob off to SX or something more industrial strength.
I think you'll have a sort of halting problem, unless you're dealing with a
known set of documents.
For a start, it's really hard to add end-tags without parsing the DTD, so
you're most of the way to a full parser right there. But you won't know if
the concrete syntax is different unless you parse the SGML declaration; you
won't know what shortrefs may be in effect unless you parse the DTD; you
won't know what could be inside an entity (since SGML doesn't have XML's
guarantee that any entity will be an integer of elements); you won't know
what the value of any marked section keyword parameter entities are unless
you've parsed the DTD; etc. In other words, I think you'll need a full
SGML parser to know if you need a full SGML parser.
On the other hand, if you're dealing with a set of documents whose complete
geneology you know, then you can work with something smaller. But SX works
and it's reasonably fast, so unless you're in an embedded processor
environment or something, I don't really see a need to re-invent the wheel.
Christopher R. Maden, Solutions Architect
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)