OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Perl XML::Parser and nested external entity references


It seems to me that the choice of leaving $base undefined is
just some sort of ease-of-use "feature" for cases when all
of the files are in a single directory.  As you noted, the
docs state "Base is the base to be used for resolving a
relative URI."

Since you're handling the resolution of external entities in
your own code, I think that you'll have to keep track of the
path information in your own code.  You'll need to get the
full base at the start and use it for each external entity.

You wrote

    there's no hook for the end of an external entity
    reference event, only the beginning.  Without this hook
    I don't know when to pop a relative base from the

but how about pushing the old base immediately upon entry to
ExternEnt and popping it just before exit?  That would do
the popping at the same point that the ExternEntFin hook in
the newer version would be used.

/s/ Ernest G. Allen
    Sunnyvale, CA, USA

- - - - 

At 10:24 AM -0700 2001-10-18, D.J. Miller II wrote:
>I'm wondering if anyone has experience using Perl's XML::Parser module
>to parse an XML document represented as a tree of .xml files connected
>by external entity references.  (Apologies if this question is
>inappropriate for xml-dev.)  I'm having a problem with the relative URIs
>(here, simply relative file system pathnames) used in such a situation.
>The problem is illustrated by the following example.
>Here's a small tree of files:
>   top/
>    |
>    +- dtd-dir/
>    |    |
>    |    +- more-ents-dir/
>    |    |    |
>    |    |    +- adolph.ent
>    |    |
>    |    +- thangle.dtd
>    |
>    +- hub.xml
>    |
>    +- some-shared-content
>         |
>         +- eleanor.xml
>The top level document is hub.xml:
>  <?xml version="1.0" encoding="us-ascii"?>
>  <!DOCTYPE thang PUBLIC "-//blub//DTD thangle//EN" "dtd-dir/thangle.dtd"[
>  ]>
>  <thang>
>      &eleanor;
>  </thang>
>The dtd it references is thangle.dtd:
>  <!-- thangle.dtd -->
>  <!ELEMENT thang (bang) >
>  <!ELEMENT bang (thud+)>
>  <!ELEMENT thud (#PCDATA)>
>  <!ENTITY % adolph SYSTEM "more-ents-dir/adolph.ent">
>  %adolph;
>The external entity referenced from thangle.dtd is adolph.ent:
>  <!ENTITY eleanor SYSTEM "../../some-shared-content/eleanor.xml">
>The external entity declared in adolph.ent and referenced back in the
>top level document is eleanor.xml:
>  <bang>
>    <thud>
>      This element and its parent are from top/some-shared-content/eleanor.xml.
>    </thud>
>  </bang>
>I'm using the XML::Parser ExternEnt hook to handle the external entity
>reference events.  This handler is given the following parameters:
>  $xp    - reference to the XML::Parser::Expat instance thats running
>           the parse
>  $base  - base to be used for resolving a relative URI (may be
>           undefined)
>  $sysid - the URI of external entity
>  $pubid - the PUBID of the external entity (may be undefined)
>When XML::Parser gets to the %adolph; reference inside of thangle.dtd
>(which was itself opened because of the reference to it in the DOCTYPE
>declaration of hub.xml), the $base parameter comes into the handler
>empty; one might think at this point that it would have a value
>something like "dtd-dir/", which was the path for the previous, 'parent'
>external entity reference.  At this point the handler is lost and can't
>open the entity, so the parse fails.
>(As an interesting aside, the Saxon 6.4.3 XSLT processor (don't know
>what version of AElfred it's using) gets similarly lost in a tree of xml
>fragment files connected with relative URIs, while the Xalan-J 2.0.1
>XSLT processor (which uses Xerces 1.23) does not have this problem.
>In the example above, Saxon *does* find the adolph.ent entity but not
>the eleanor.xml one.)
>I could get around this empty value for $base by keeping a pushdown list
>of the relative URI 'bases' (dirname parts of URIs of previously seen
>external entity references) except for one thing:  in the version of
>XML::Parser I'm constrained to use, 2.27, there's no hook for the end of
>an external entity reference event, only the beginning.  Without this
>hook I don't know when to pop a relative base from the pushdown, and so
>again get lost in the tree.  As of version 2.28 of XML::Parser there is
>such a hook:  "ExternEntFin".
>Is there any way to make XML::Parser 2.27 give meaningful values for the
>$base parameter to one's ExternEnt handler, or am I doomed to come up
>with a different Perl-accessible XML parser?
>               James Miller in Austin, Texas
>       Internet:   jamesm@bga.com       (
>      alternate:   jamesm@wixer.bga.com (