[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")putInside the XML Document?
- From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- To: Rick Jelliffe <rjelliffe@allette.com.au>
- Date: Fri, 21 Sep 2007 09:31:08 +0200
Rick Jelliffe a écrit :
> Philippe Poulard said:
>> I guess some parsers have additional heuristics for reading successfully
>> the sequence <?xml encoding="blah-blah"?> ; maybe some try-catch to
>> apply with the set of charset they know ?
>
> I hope they don't, unless they are specific tools for repairing broken
> documents.
>
> Guessing encoding is the *opposite* of the XML approach and should be
> strongly resisted. The XML approach is based on explicit labeling as the
> only approach that is reliable (which is not the same as not-stuff-up-able
> of course).
This was not what I meant
XML documents are either in UTF-8, or in the encoding specified by <?xml
encoding="blah-blah"?>
I meant that parsers must try to guess what is specified, and then to
switch to what is specified ; this is exactly what they are doing with
ASCII (possibly encoded in 1,2 or 4 bytes) as fortunately it is
compatible with lots of widely used encodings (UTF-8, UCS2, ISO-8859-,
etc) : they rely on ASCII (1,2 or 4 bytes according to the BOM, if any)
to understand what is the encoding, or to EBCDIC
Consider this declaration :
<?kzy rapbqvat="EBG-13"?>
The deecoded form of this declaration is :
<?xml encoding="ROT-13"?>
I can get it only if I test "ROT-13" on it ; althouh it is not strictly
spoken an encoding, a parser that would support "ROT-13" would be able
to decode it only if it test it or if it recognize the magic ASCII
string "<?kzy" or whatever is the "guess" heuristic.
--
Cordialement,
///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]