Lists Home |
Date Index |
From: "Koti" <firstname.lastname@example.org>
> I tried to load an XML which has #x81 character as part of one XML node value.
> ( I have attached the sample XML file as an attachment.)But MSXML parser and
> IE6.0 failed to load that document with xml version 1.0 and gave an error messgae says
> "An invalid char found in xml file." that pointed to #x81 char, but according to BNF
> notation #x81 is valid char. Does anybody have any clue why MSXML parser failed to
> load this document?
Your XML file has no encoding declaration, so we can expect that MS-XML will
attempt to interpret the file as UTF-8.
In Unicode, there is no character 0x81 that corresponds to a printing character
such as something that looks like "a". In the UTF-8 encoding, there is also
no code sequence consisting only of a solitary byte > 0x7F: they must always be
paired or three-in-a-row.
Therefore your document does not seem to be encoding in UTF-8, and MSXML
is correct in rejecting it.
It is difficult to figure out what encoding you in fact have: many Macintosh character
sets use 0x81 for variants on A and some Windows code pages use it for u umlaut.
The other common encoding that uses the code point 0x81 are recent Windows
encodings, which use it for the Euro sign.
Because that is the only character that doesn't use a numeric character reference,
and because the characters you use there are a pretty strange collection, I think
you may be trying to add some binary data, using the character range above
0x7F. If you are doing that, then you cannot use UTF-8 encoding, and you should
try ISO-8859-1 (and even then, some parsers may legitimately reject if for
reasons we needn't go into.) If you are not doing this, and you are trying to
send some natural language, then you certainly need to specify the encoding
that you are using: which language is it?