Lists Home |
Date Index |
> Another fact that I think has been overlooked is the following.
> The following fragment of XML (encoded in UTF-8+names but
> displayed as if it were encoded in UTF-8) contains exactly 18
> Unicode characters:
> <a>one two<</a>
> because counts as one character and <
> counts as 4 characters.
> The UTF-8+names encoding of this fragment of XML occupies 23
> bytes. The UTF-8 encoding occupies 19 bytes.
... and, by the way, the following fragment of XML is different from the one above (although it *looks* the same in this email) and contains 23 Unicode characters instead of 18:
The UTF-8 encoding of this fragment of XML occupies 23 bytes. The UTF-8+names encoding is longer than that because the first ampersand must be encoded as the three ASCII bytes & & ; so that the XML entity reference is not mistaken for the pseudo-entity