Lists Home |
Date Index |
Uche Ogbuji wrote:
>>>No. It's illegal to have ]]> anywhere in character content except at
>>Dang, you're right. oops.
>>>True, although I think it's simpler to just pick one quotation style
>>>and escape a single character rather than trying to be clever about
>>>using the "correct" quotation mark in each case.
>>Prob'ly so. That puts us up to < & > " as special. Still reasonable,
>>I think, even if it doubled my original claim.
> Perhaps, but I think this little exchange also demonstrates my point that it's
> never as simple as one thinks it is.
> If this sort of thing tripped you up, Rich, imagine the potential for failure
> by the average programmer.
It's worse than this. If your infoset contains a carriage return, you
have to output it as a numeric character reference, otherwise line-end
normalization will turn it into a line-feed. Similarly, if attribute
values in the infoset contain line-feeds or tabs, they need to be output
as numeric character references, otherwise attribute value normalization
will turn them into spaces.
If you still think it's easy, try serializing the infoset you get from this:
<!DOCTYPE doc [
<!ENTITY e "<?x y ?>">