[
Lists Home |
Date Index |
Thread Index
]
Tim Bray wrote:
> I'm using XML::XPath in the obvious way, along the lines of
>
> my $xp = XML::XPath->new(filename => $ARGV[0]);
>
> and the silly file has <!DOCTYPE foo SYSTEM "./foo.dtd">
>
> I don't want it to validate and in fact I don't want it to read
> foo.dtd. I can achieve this by doing "cp /dev/null foo.dtd" but is
> there a way to tell whatever machinery underlies XML::XPath to just
> ignore the <!DOCTYPE>? -Tim
I don't recall the details as I haven't used XML::Parser in ages, but
XML::Parser can be told to not read DTDs. You can then pass your
properly configured XML::Parser instance to XML::XPath's constructor
using the 'parser' option. There is also a SAX handler in XML::XPath so
that you could use the options of any SAX parser.
The same question has been asked of XML::LibXML, which tends to
stubbornly insist on reading the DTD no matter how strongly you tell it
not to. The following makes it skip the resolution of all external
entities which end in ".dtd" (which of course can be improved upon):
use strict;
use XML::LibXML;
my $parser = XML::LibXML->new;
$parser->validation(0);
$parser->load_ext_dtd(0);
$parser->callbacks(\&matchIRI, \&openIRI, \&readIRI, \&closeIRI);
my $doc = $parser->parse_string(<<'EOT');
<?xml version='1.0' encoding='iso-8859-1' standalone='yes'?>
<!DOCTYPE library SYSTEM 'acme.dtd'>
<acme>
<foo id='baz'/>
<foo id='baz'/>
</acme>
EOT
sub matchIRI { return shift =~ /\.dtd$/; }
sub openIRI { return \*GLOB; }
sub readIRI { return ""; }
sub closeIRI {};
--
Robin Berjon
Research Scientist
Expway, http://expway.com/
|