OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Perl XML::Parser and nested external entity references

On Fri, 19 Oct 2001 03:41:21 -0700 "Ernest G. Allen"
<ernestgallen@earthlink.net> wrote:
> It seems to me that the choice of leaving $base undefined is
> just some sort of ease-of-use "feature" for cases when all
> of the files are in a single directory.  As you noted, the
> docs state "Base is the base to be used for resolving a
> relative URI."

I've learned a bit more through experimentation with version 2.28 of
Perl's XML::Parser, which I believe is based on the same version of
Expat (1.1?) as the XML::Parser 2.27 I was originally using.

The $base parameter given to one's ExternEnt handler (the user-supplied
sub called by XML::Parser::parse() upon entry into an external entity
reference) has to be updated by calling XML::Parser::Expat::base().  In
other words, for $base to be usable in a call to the ExternEnt handler,
the previous call to the same handler had to call $xp->base() with the
dirname [of the $sysid] of the entity being expanded.  (Also, you have
to 'prime' $base with the dirname of the top level XML input file the
first time the ExternEnt handler is called.)  It seems that, once given
the base values via $xp->base() calls, XML::Parser::Expat is internally
keeping track of the associations between external entity names and
the locations in the filesystem (bases) where they were declared.

> Since you're handling the resolution of external entities in
> your own code, I think that you'll have to keep track of the
> path information in your own code.  You'll need to get the
> full base at the start and use it for each external entity.
> You wrote
>     there's no hook for the end of an external entity
>     reference event, only the beginning.  Without this hook
>     I don't know when to pop a relative base from the
>     pushdown...
> but how about pushing the old base immediately upon entry to
> ExternEnt and popping it just before exit?  That would do
> the popping at the same point that the ExternEntFin hook in
> the newer version would be used.

This wouldn't quite work.  These callbacks are called serially in time
but are not recursive (on a stack); the ExternEntFin hook gets called
after the ExternEnt hook has returned.

Actually I had it wrong, too.  To keep track of this you'd have to push
the base location each time you enter an external entity reference,
apply this base to each sysid found in external entity *declarations*
found in this reference (and then stash the result for each declaration
for subsequent use), and then pop the base upon exit from the reference.

The mechanisms for this roll-you-own scheme aren't all quite there in
2.27 (no ExternEntFin hook, so you don't know when to pop the base) or,
apparently, in 2.28 either (which does have the ExternEntFin hook, and
which reports general entity declarations but, unlike 2.27, not
parameter entity declarations to the Entity handler -- maybe there's a
grungier way to catch the parameter entity declarations that I couldn't

But then the slightly mysterious $base/$xp->base("...") 'internal'
mechanism which does not work in 2.27 does work in 2.28, except that it
does cause Perl to segfault on some examples.  Sigh.  At least the
segfaults had been cleared up by XML::Parser 2.29 (2.30 is the most
current version).

Hope this helps someone.  Because I might have to live with 2.27 in my
present situation, a Perl API to Xerces-J might be the best thing to try
next.  =8^(

               James Miller in Austin, Texas

       Internet:   jamesm@bga.com       (
      alternate:   jamesm@wixer.bga.com (