[
Lists Home |
Date Index |
Thread Index
]
The HTML 4.0.1 DTD is not an XML DTD; it's SGML.
Bob Foster
rickc@rdrop.com wrote:
> Hello -
>
> Somewhere in the expat docs this list is given as a place to
> ask questions. I've looked on the web for an answer
> and have come up empty, so I hope it's OK to ask here.
>
> What I want to do is use expat to read HTML 4.01. I have
> no problem with the requirement of the input being
> well formed. But, to do this I have to read the standard
> "loose" or "strict" DTD.
>
> What I've tried:
>
> XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_ALWAYS);
> XML_SetExternalEntityRefHandler(parser,ext_ref_handler);
>
> and then in ext_ref_handler, create a sub_parser:
>
> XML_ExternalEntityParserCreate(parser,context,encoding)
>
> and feed the DTD file to it.
>
> The problem:
>
> It chokes on the first non-comment line (an ENTITY) in loose.dtd
> from w3.org with a syntax error. It doesn't seem to like the "--"
> as a comment delimiter.
>
> Should I not use loose.dtd? I kinda assumed it is a valid DTD...
>
> Are there any examples/samples of parsing a DTD? I found lots of
> descriptions of how to do it (which seems to be what I did) but
> no code.
>
> This is with version expat-1.95.8
>
> Thanks,
>
> Rick
>
> rickc
> rdrop.com
|