In a private email message, a friend proposed that the second paragraph of 4.7.6 in the XInclude spec gave a clear answer. In the course of replying to that message, I have resolved the question to my satisfaction. The second paragraph of 4.7.6 says: An XInclude processor should augment the source infoset and the acquired infoset by adding the language property to each element information item. The value of this property is the normalized value of the xml:lang attribute appearing on that element if one exists, with xml:lang="" resulting in no value, otherwise it is the value of the language property of the element's parent element if one exists, otherwise the property has no value. For context, here’s my example, simplified to remove the obvious cases: <doc xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en"> <xi:include href="xx.xml" fragid="element(/1/1)"/> </doc> And xx.xml is: <chap><p>Something</p></chap> Let’s decode the second paragraph of 4.7.6 for this concrete example. At the point where we’re processing the “p” fragment: 1. The source infoset is the one that contains “doc”, 2. The acquired infoset is the one that contains “chap”, 3. The top-level-included-item is the “p” 4. The include-parent is the “doc” element. Looking in detail at the second paragraph, we find: An XInclude processor should augment the source infoset and the acquired infoset by adding the language property to each element information item. XInclude is speaking in terms of augmenting the infoset with a language property. That’s not exactly the same as an xml:lang attribute. I think it’s saying that every item in 1 and 2 should have an infoset property named “language” that identifies its language. Since the Infoset is an abstraction and not a realized data model, that’s not the same as the attribute, which will follow later. The value of this property is the normalized value of the xml:lang attribute appearing on that element if one exists, It follows, I hope plainly, that the language property of the element “doc” is “en”. with xml:lang="" resulting in no value, There are no xml:lang attributes with the explicit value "", so this clause does not apply. otherwise it is the value of the language property of the element's parent element if one exists, otherwise the property has no value. I think the implication here is that the value of the language property for all of the nodes in the acquired infoset have no value: p’s parent is chap, chap’s parent is the document, none of them have a language property.¹ Observe critically that the “p” has not been added to the augmented infoset at this point: it has no “doc” ancestor from which to inherit the language. That’s the end of the second paragraph of 4.7.6, let’s look at the next paragraph: Each element information item in the top-level included items which has a different value of language than its include parent (taking case-insensitivity into account per [IETF RFC 3066]), By the reasoning above, the language property of the top-level included item “p” is different from the language of its include parent, “doc”. or that has a value if its include parent is a document information item, This case doesn’t apply. has an attribute information item added to its attributes property. This attribute has the following properties: Okay, so we *are* going to add an attribute to the “p” element. The list that follows describes the infoset properties of the attribute, The significant point is: 4. A normalized value equal to the language property of the element. If the language property has no value, the normalized value is the empty string. The language property has no value, so the normalized value of the attribute is the empty string. I have now convinced myself that the correct result is: <doc xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en"> <p xml:lang="">Something</p></chap> </doc> Time to fix my XInclude processor, I think. Be seeing you, norm ¹ There’s a *really* interesting and completely tangential question here about whether or not the document information item could have a language property if it was served over HTTP with a content-language: header. I think the most useful answer is probably “yes”, but that’s *not* the question here. -- Norman Tovey-Walsh <ndw@nwalsh.com> https://nwalsh.com/ > Doing more things faster is no substitute for doing the right > things.--S. R. Covey
Attachment:
signature.asc
Description: PGP signature