OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Make XML::XPath bypass DTD?

[ Lists Home | Date Index | Thread Index ]

Tim Bray wrote:
> I'm using XML::XPath in the obvious way, along the lines of
> my $xp = XML::XPath->new(filename => $ARGV[0]);
> and the silly file has <!DOCTYPE foo SYSTEM "./foo.dtd">
> I don't want it to validate and in fact I don't want it to read 
> foo.dtd.  I can achieve this by doing "cp /dev/null foo.dtd" but is 
> there a way to tell whatever machinery underlies XML::XPath to just 
> ignore the <!DOCTYPE>? -Tim

I don't recall the details as I haven't used XML::Parser in ages, but 
XML::Parser can be told to not read DTDs. You can then pass your 
properly configured XML::Parser instance to XML::XPath's constructor 
using the 'parser' option. There is also a SAX handler in XML::XPath so 
that you could use the options of any SAX parser.

The same question has been asked of XML::LibXML, which tends to 
stubbornly insist on reading the DTD no matter how strongly you tell it 
not to. The following makes it skip the resolution of all external 
entities which end in ".dtd" (which of course can be improved upon):

use strict;
use XML::LibXML;
my $parser = XML::LibXML->new;

$parser->callbacks(\&matchIRI, \&openIRI, \&readIRI, \&closeIRI);

my $doc = $parser->parse_string(<<'EOT');
<?xml version='1.0' encoding='iso-8859-1' standalone='yes'?>
<!DOCTYPE library SYSTEM 'acme.dtd'>
   <foo id='baz'/>
   <foo id='baz'/>

sub matchIRI { return shift =~ /\.dtd$/; }
sub openIRI { return \*GLOB; }
sub readIRI { return ""; }
sub closeIRI {};

Robin Berjon
   Research Scientist
   Expway, http://expway.com/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS