Lists Home |
Date Index |
- From: David Megginson <firstname.lastname@example.org>
- To: xml-dev Mailing List <email@example.com>
- Date: Sun, 21 Dec 1997 21:00:53 -0500
MURATA Makoto writes:
> >1) Are you certain that ignoring the encoding declaration is
> > conforming behaviour?
> Yes, I am certain that ignoring the encoding declaration for text/xml
> is conforming behaviour. This is to allow transcoding.
Thank you again for your posting and for your work on the MIME types.
I do not have access to any clarifications that may have been posted
in the SIG, so I necessarily rely only on the text of the PR. The
following appears in the (normative) section 4.3.3, "Character
Encoding in Entities":
It is an error for an entity including an encoding declaration to
be presented to the XML processor in an encoding other than that
named in the declaration, or for an encoding declaration to occur
other than at the beginning of an external entity.
On the other hand, the following appears in appendix F, "Autodetection
of Character Encodings (Non-Normative)":
The second possible case occurs when the XML entity is accompanied
by encoding information, as in some file systems and some network
protocols. When multiple sources of information are available,
their relative priority and the preferred method of handling
conflict should be specified as part of the higher-level protocol
used to deliver XML. Rules for the relative priority of the
internal label and the MIME-type label in an external header, for
example, should be part of the RFC document defining the text/xml
and application/xml MIME types.
If "internal label" means the encoding declaration, then this note
supports your statement; unfortunately, the note is non-normative,
while the excerpt that I quoted first is normative, so the first must
take precedence (unless I've missed something elsewhere in the PR).
If the paragraph in the non-normative appendix expresses the WG's true
intention, then the PR will need to be revised to support it.
I think, however, that it would be unfortunate if the charset
parameter were used. Consider, for example, the following document,
encoded in ASCII (despite the incorrect claim in the encoding
<?xml version="1.0" encoding="ISO-10646-UCS-2"?>
<doc>This is a sample XML document.</doc>
Let's say, now, that I place this document in a directory that is
accessible through both HTTP and anonymous FTP, and also put a copy on
my local machine. Here's what will happen:
1) java EventDemo http://www.myhost.org/texts/sample.xml
==> receives charset="ISO-8859-1" as the default, ignores the
encoding declaration, produces correct output (accidentally),
and reports no error.
2) java EventDemo ftp://ftp.myhost.org/pub/texts/sample.xml
==> reads the encoding declaration, realises that the document is
_not_ in UCS-2, and reports an error (or worse, puts out
garbage without reporting an error).
3) java EventDemo sample.xml
==> same as (2).
It is counter-intuitive that well-formedness depends on the
> Rick Jelliffe proposed that only application/xml should be used in the
> XML SIG. I will follow the consensus in the XML SIG or WG.
Please feel free to repost this message to the SIG, if you think that
it will be helpful there.
I strongly support Rick's suggestion for application/xml, partly
because it will avoid the requirement to make several last-minute
changes to the PR, and partly because it will save XML from being
trapped by some of the same constraints as HTML. If typical (private)
users cannot post XML documents in their web space in languages other
than English, then the whole effort will be at least a partial
All the best,
David Megginson firstname.lastname@example.org
Microstar Software Ltd. email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)