Lists Home |
Date Index |
- From: David Megginson <firstname.lastname@example.org>
- To: email@example.com
- Date: Mon, 22 Dec 1997 10:42:48 -0500
Gavin Nicol writes:
> > It is an error for an entity including an encoding declaration to
> > be presented to the XML processor in an encoding other than that
> > named in the declaration, or for an encoding declaration to occur
> > other than at the beginning of an external entity.
> Note that this is "an error" not a fatal, or even necessarily
> reportable error.
Absolutely correct -- Tim Bray made the same point on this list a
couple of weeks ago. The parser is not _required_ to report an error,
but it is allowed to; in either case the document is still not
> >1) java EventDemo http://www.myhost.org/texts/sample.xml
> > ==> receives charset="ISO-8859-1" as the default, ignores the
> > encoding declaration, produces correct output (accidentally),
> > and reports no error.
> It could report a mismatch.
In this case yes, because it was possible to parse the encoding
declaration. If the document had been encoded in UCS-2, it is
unlikely that the parser would even have recognised an encoding
declaration if it were trying to parse with the default
charset="ISO-8859-1" (the parser would have to have some very
sophisticated error-recovery techniques).
> >2) java EventDemo ftp://ftp.myhost.org/pub/texts/sample.xml
> > ==> reads the encoding declaration, realises that the document is
> > _not_ in UCS-2, and reports an error (or worse, puts out
> > garbage without reporting an error).
> >3) java EventDemo sample.xml
> > ==> same as (2).
> >It is counter-intuitive that well-formedness depends on the
> >transmission protocol.
> I would argue that all 3 could, and perhaps should produce similar
In that case, however, it will be necessary to amend the PR, so that
parsers will not have the option of reporting an error, and so that
the documents will qualify as well-formed.
> This has nothing to do with MIME types. The main reason for problems
> is that people (often unknowingly) violate the standards. HTTP is
> pretty clear that for anything other than ISO 8859-1, the content must
> be labelled correctly (i.e. it must have the correct charset).
Unfortunately the only people who have control over that labelling are
the system administrators -- if Sprynet decides to return the MIME
type text/xml for all *.xml files, then I probably will not have the
option of posting XML documents on my personal web site in anything
Furthermore, the other problem remains: if text/xml uses ISO-8859-1 as
the default, the the PR _must_ be amended to require XML processors to
support ISO-8859-1 encoding -- after all, XML is a profile of SGML
designed specifically for the Internet, and we will have a lot of
explaining to do if it cannot play nicely.
> The only time application/xml really makes sense is when UCS-2 or
> UTF-16 data is being sent via email.
In theory, yes; in practice, no. Private users built HTML into
something big enough to attract the interest of the corporate and
government sectors -- using text/xml will mean that for the next
several years, at least, many private users will be unable to post
anything but ISO-8859-1-encoded documents in their personal web space
easily (and no XML parsers are required to support that encoding).
This type of consideration does not matter so much for SGML, which is
an International Standard defined independent of its media; XML,
however, is a consortium standard created for a specific medium, so it
cannot afford to ignore the more pragmatic concerns.
All the best,
David Megginson firstname.lastname@example.org
Microstar Software Ltd. email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)