[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")putInside the XML Document?
- From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- To: Eric Bréchemier <eric.brechemier@gmail.com>
- Date: Fri, 21 Sep 2007 11:45:03 +0200
Eric Bréchemier a écrit :
> Hello Philippe,
>
> On 9/21/07, Philippe Poulard wrote:
>> Consider this declaration :
>> <?kzy rapbqvat="EBG-13"?>
>>
>> The deecoded form of this declaration is :
>> <?xml encoding="ROT-13"?>
>> I can get it only if I test "ROT-13" on it ; althouh it is not strictly
>> spoken an encoding, a parser that would support "ROT-13" would be able
>> to decode it only if it test it or if it recognize the magic ASCII
>> string "<?kzy" or whatever is the "guess" heuristic.
>>
>
> I think you found an interesting example of ambiguous encoding,
> falling in the category of "Character encodings such as UTF-7 that
> make overloaded usage of ASCII-valued bytes" which "may fail to be
> reliably detected." as mentioned in
> http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
>
> In this example, without external information, there is no way IMHO to
> know for sure with limited input whether this is a ROT-13 encoding, or
> an UTF-8 document starting with the "kzy" processing instruction.
Yes, it is a special case in the special cases, as this silly encoding
is fully-compatible with ASCII-based encodings ; the consequence is that
a parser that doesn't know this encoding will parse correctly this
document ; in the same way, you can specify <?xml encoding="UTF-8"?>
whereas your document has been encoded in ISO-8859-1 ; if your document
doesn't use àéèê characters and others out of the ASCII-7 bit, it will
be decoded correctly
Back to the ROT-13 example, a parser that support this encoding could
try something before falling back to UTF-8
The tip for parsers is that when a strategy (BOM and others as specified
in the spec., or some other hazardous heuristic) lead to <?xml
encoding="XXX"?>, then XXX can be applied safely
--
Cordialement,
///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]