OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] When to check entity WFness according to 4.3.2

[ Lists Home | Date Index | Thread Index ]


> Continuing along the same lines I have been going for the past week I 
> have another question about entity details. Section 4.3.2 of the XML rec 
> says:
> 
> "An internal general parsed entity is well-formed if its replacement 
> text matches the production labeled content."
> 
> My question is: when should such a WFness check be performed.
> 
> (a) Immediately after the literal is parsed
> (b) Immediately after the DTD has been parsed
> (c) Only when it is referenced

If memory serves me correctly, then Expat is doing (c).
 
> I don't think (c) is the answer-- especially because of some of the 
> points that were made in this thread [1]. So that leaves us with (a) and 
> (b). The problem with (a) is that element declarations important  to the 
> WF check of the literal value might not have occurred. Consider:
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (foo)>
> <!ENTITY e "<foo id='This is not an id!'/>">
> <!ELEMENT foo EMPTY>
> <!ATTLIST foo id ID #IMPLIED>
> ]>
> <doc>&e;</doc>
> 
> versus:
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (foo)>
> <!ELEMENT foo EMPTY>
> <!ATTLIST foo id ID #IMPLIED>
> <!ENTITY e "<foo id='This is not an id!'/>">
> ]>
> <doc>&e;</doc>
> 
> This leaves us with option (b)-- perform the WFness check of the literal 
> value once the DTD has been parsed. The only hitch with this is test 
> case valid-sa-86 from the xml test suite.
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (#PCDATA)>
> <!ENTITY e "">
> <!ENTITY e "<foo>">
> ]>
> <doc>&e;</doc>
> 
> This test is supposed to be wellformed which would only be the case if 
> we accepted option (c), or if overridden entity literals are discarded 
> and we go with option (b). This makes sense when considering the 
> distinction between the literal entity text and the replacement text. 
> Section 4.3.2 refers only to checking the WFness the replacement text 
> not the actual literal.
> 
> Is this the correct interpretation?

IMO, (b) would be correct, but it is much easier to implement (c).
The only case when (c) does not catch the WF violation is when
the entity is never referenced. That is, Expat will not catch this:

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
<!ENTITY e "<foo>abc</not_foo>">
]>
<doc></doc>

Karl


Karl




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS