[
Lists Home |
Date Index |
Thread Index
]
>(Richard Tobin, can you give an example of the trick in question?)
Ok:
<!DOCTYPE foo [
<!ENTITY a '<a
b="c"/>'>
]>
<foo>&a;</foo>
This is well-formed XML 1.0 because the S production includes CR
(#xD). The replacement text of the entity a contains a CR character,
and it gets parsed as whitespace. This is of course the exception:
real CR characters in the document are translated on input and are
never matched against the productions.
The natural analogy would be to do the same for NEL (#x85), but this
would complicate life for (at least some) parsers greatly - they would
need a check in many places in order to tokenize differently depending
on the document version. Since there are no existing well-formed
documents that rely on NEL being in S, and no good use cases for
examples like the one above, we decided it was not worth adding it.
So this will not be well-formed:
<?xml version="1.1"?>
<!DOCTYPE foo [
<!ENTITY a '<a…b="c"/>'>
]>
<foo>&a;</foo>
Similarly, we didn't add NEL to the characters normalized to space
in attributes. And again, this only makes a difference if you use
a character reference. In XML 1.0, this is valid:
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ATTLIST foo att NMTOKENS #IMPLIED>
<!ENTITY b 'foo
bar'>
]>
<foo att="&b;"/>
but the corresponding document with … instead of 
 will
be invalid in 1.1.
We did consider the suggestion of removing CR from S in 1.1, but apart
from pointlessly breaking backward compatibility it would again mean
that parsers would have to test the version number when tokenizing.
-- Richard
|