OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] @xml-base in subtrees included (a) via entityexpansion, and (b) via xi:include

The problem is that xerces team is looking for volunteers to publish a release with bug fixes that are quite old, and won't probably be able to add required extension property.

Have a look at this post : https://mail-archives.apache.org/mod_mbox/xerces-j-dev/201711.mbox/browser


Le 16/11/2017 à 19:36, C. M. Sperberg-McQueen a écrit :
On Nov 16, 2017, at 1:42 AM, Michael Kay <mike@saxonica.com> wrote:

(3) Your upstream processor is, as required by the spec, leaving xml:base attributes in the top-level included element items of an inclusion.  And for reasons not at all clear ot me, it is sometimes using a relative reference in the xml:base attribute, and not an absolute reference.

Are those inferences correct?
Yes. (On (3), I think that there are some good usability reasons for putting a relative reference in the xml:base attribute - it makes the document relocatable as part of a complex of interlinked documents)
I think relocatability is a good reason for most or all human-assigned URIs,
including instances of xml:base, to be relative.  I don’t see how it is useful
for the xml:base attributes injected by an XInclude processor, which I expect
to have transient utility anyway (they are part of one particular processing
of the input). Possibly I’m just dense, or the scenario in which it’s useful is not
one I have encountered.

Oh, OK.  (D’oh.)  If I’m moving a set of interlinked documents using XInclude, I
normally want to move them to an isomorphic set of interlinked documents using
XInclude  at a different location.  And in that case, the result of XInclude is transient.
If, however, I want to maintain some material in a set of 100 interlinked
documents at location 1, and use XInclude to construct three documents which
include material from that set of 150 and which do not themselves use XInclude,
and then place them at location 2, then yes, I think I see why one might want
the xml:base attributes to have relative URIs.

(Actually, the most compelling thing I see is another good reason not to try
to play clever games with base URIs.  But that doesn’t really help you.  If I did
find myself backed into a corner and forced to use xml:base in subtle ways,
I would really want the processor to produce correct results.)

Let me try giving an example. Consider first a single-entity document (A) with base URI http://example.com/doc.xml:

  <in xml:base="dir/in.xml"/>

then the base URI of <in> is http://example.com/dir/in.xml.
With you so far.

If we now take this document (B) at the same location:

<!DOCTYPE out [
<!ENTITY e SYSTEM "dir/in.xml">

where the external entity is

<in xml:base="dir/in.xml"/>

then the document after entity expansion is

  <in xml:base="dir/in.xml"/>

but the base URI of <in> is now http://example.com/dir/dir/in.xml

If we now take this document (C) at the same location:

  <xi:include href="dir/in.xml"/>

where the included document is


then the expanded document (delivered by Apache Xerces) is

  <in xml:base="dir/in.xml"/>

and the base URI of <in> is http://example.com/dir/in.xml
And yes.  And … ouch.

It is here that I rub my eyes in surprise at the decision of the parser
implementors to insist on using a relative URI here, and not offer
the option of using an absolute one.  After all, the parser + XInclude
processor has all the relevant information and can make the correct
distinctions; its output in this case seems to be erasing the
information needed to allow the downstream user to calculate the base
URIs correctly.

If an XML parser did that to me (erased substantive differences in
the input), I would be looking for a new XML parser.  I realize that
you may not regard that as a viable option.

In fairness to the implementors, it is pretty clear that XInclude tried
hard to make itself invisible in the infoset.  I don’t know whether to be
upset that the responsible WG missed this particular interaction among
general entities, XInclude, xml:base, and the general rules for base URIs,
or impressed that it has taken this long to surface.

My challenge is to distinguish these three cases, where the surface structure (the values of elements and attributes) is in all cases the same, but the base URIs are different. In particular the XPath expression base-uri(//in) must deliver the correct answer in all three cases.

I currently distinguish (A) and (B) by detecting that the location information supplied by the SAX parser for the <in> element has a different systemID from the location of the <out> element. But this heuristic is giving me the wrong answer for case (C).

It does occur to me that there is one way I could detect the difference between (B) and (C): presumably the SAX parser will call LexicalHandler.startEntity() and LexicalHandler.endEntity() for case (B), but not for case (C). Using that information will be messy (it needs an extra bit somewhere in the XDM model representation, and bits are in short supply) but it may be do-able.
Given that you are looking aside at the location information in order to
identify the correct base URI in case (B), I suspect that you would be
able to calculate the correct base URI in case C if you were to set feature
http://apache.org/xml/features/xinclude/fixup-base-uris to false.  The
drawback is that the ‘in’ element of case (C) would then have no
xml:base attribute.  This would at least allow you to offer the user a
choice:  correct base URI and a missing xml:base where XInclude
says one should be injected, or a correct set of attributes and
incorrect base URI.

Can you tell your upstream XInclude processor to use absolute URIs in all injected xml:base attributes?
Can you ask it to provide the extension property named “include history”?

I would be tempted to lobby the Xerces team for some extension
property to signal that a given node is a top-level node in an
external entity, or a top-level node in an XIncluded node set.
Or a base-URI property …  But perhaps that ship has sailed.

Does this problem present itself only in the SAX interface, or also
in the DOM interface?

C. M. Sperberg-McQueen
Black Mesa Technologies LLC


XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS