XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")putInside the XML Document?

Eric Bréchemier a écrit :
>   Hello Philippe,
> 
> On 9/21/07, Philippe Poulard wrote:
>> Consider this declaration :
>> <?kzy rapbqvat="EBG-13"?>
>>
>> The deecoded form of this declaration is :
>> <?xml encoding="ROT-13"?>
>> I can get it only if I test "ROT-13" on it ; althouh it is not strictly
>> spoken an encoding, a parser that would support "ROT-13" would be able
>> to decode it only if it test it or if it recognize the magic ASCII
>> string "<?kzy" or whatever is the "guess" heuristic.
>>
> 
> I think you found an interesting example of ambiguous encoding,
> falling in the category of "Character encodings such as UTF-7 that
> make overloaded usage of ASCII-valued bytes" which "may fail to be
> reliably detected." as mentioned in
> http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
> 
> In this example, without external information, there is no way IMHO to
> know for sure with limited input whether this is a ROT-13 encoding, or
> an UTF-8 document starting with the "kzy" processing instruction.

Yes, it is a special case in the special cases, as this silly encoding 
is fully-compatible with ASCII-based encodings ; the consequence is that 
a parser that doesn't know this encoding will parse correctly this 
document ; in the same way, you can specify <?xml encoding="UTF-8"?> 
whereas your document has been encoded in ISO-8859-1 ; if your document 
doesn't use àéèê characters and others out of the ASCII-7 bit, it will 
be decoded correctly
Back to the ROT-13 example, a parser that support this encoding could 
try something before falling back to UTF-8

The tip for parsers is that when a strategy (BOM and others as specified 
in the spec., or some other hazardous heuristic) lead to <?xml 
encoding="XXX"?>, then XXX can be applied safely

-- 
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS