xml-dev - Re: FYI: Announcement of a new I-D for XML media types

Re: FYI: Announcement of a new I-D for XML media types

[ Lists Home | Date Index | Thread Index ]

From: Rick JELLIFFE <ricko@geotempo.com>
To: xml-dev@XML.ORG
Date: Wed, 10 May 2000 03:03:59 +0800

Murata Makoto wrote:
 
> We are looking forward to your feedbacks.

Very disappointing to see:

1) Charset "should" be given for application/xml.  HTTP has a character
set handling concept that comes from fantasyland. I would recommend a
very different policy: never use xml/data, always use application/xml;
never use charset, always use xml encoding declarations.

2) "When non-validating processors handle XML documents, they do not
      always read external parsed entities. Thus, interoperability is
      not guaranteed."
This is just FUD: why isnt this handled by the "standalone" declaration.
If it is a comment about bugs in software, that is out-of-place here. 


3) Support for xml:base. Xml:base is currently being railroaded through
W3C with requirements document--on behalf of my organization which is a
W3C member I have repeatedly asked what its justification is, and there
has never been any answer.  Xml:base is dangerous because it creates an
unlabelled dialect of XML--a general XML editor cannot treat URIs as
text when cutting and pasting, it also may have to do something with
xml:base.  It would be OK if managed as part of some more general
package, but not by itself. In any case, it is not clear whether
xml:base applies to all data marked by a schema as a URI or just to data
marked as an xlink:href. It does not apply to URIs in SYSTEM identifiers
in entities, ASAIK. 

Without knowing which URIs are "known" by xml:base, and what its
interaction is with xml schemas, broad statements that embedded URIs
should be interpreted relative to xml:base are surely incorrect, or at
least too early. There is no need for F with  xml:base, but U and D are
certainly warranted.

4) The rocket scientists at IETF have managed a new thing with the spec
for utf16be (if you use utf16be you cannot have a BOM apparantly): it
means that not only can you do too little as far as labelling your data,
you can now do *too much*! If you want to use big-endian utf16 and your
software sticks in a BOM just to be safe you are ruined. I thought I had
seen everything. This makes the well-intentioned user pay: it looks like
an enabling provision, but its effect is surely to prevent the use of
big-endian UTF16.  Users should not be penalized for providing "too
much" labelling.

5) Along similar lines, but far worse and of major importance for
internationalization, the fragment identifier of a URI has to be in
US-ASCII with %HH escaping.  Here I am in Taipei and I want to include
an Xpointer to refer to an ID or element name or attribute name or
value, and I have to first find the numeric values of my Big5, then
trancode it into Unicode, then find out what the Unicode values are in
HEX, then put them in. Is that the way it is supposed to work?   This is
exactly the sort of thing that should be provided by the XML
infrastructure, not by the poor user: to tell the user "you can say
'yes' in your native language in that attribute value, but you cannot
type it directly when you want to reference that elemement" is not
acceptable.  And what about XSLT: does it mean that when I use an XPath
which includes a document() reference, that I have to suddenly stop when
I get to the fragment identifier of the URI and switch to %HH?

This draft has lost the plot. XML is first and foremost a markup
language: that is its name, that is its purpose, that is what we want.
Someone should be able to open their local text editor and create a
legitimate document using all the characters available in that editor,
without every having to perform any character-to-number conversions or
looking up any character tables. This is a basic operational simplicity
which gives XML 99% of its value.

If HTTP requires data in a different format, the XML infrastructure
should provide that transparently. If IETF or any RFC pupports to make
any requirement on how I can mark up legitimate characters, then the
comment we should respond is terse an monosyllabic: it is not the
business of an RFC to mandate any particular encodings within an XML
document. It is ultra vires. (Note, I am *not* saying that an RFC cannot
proscribe certain characters. I am saying that an RFC oversteps itself
if it tries to tell me that I must use %HH rather than &#HHHH; or a
direct character inside my XML document.)

If a future technology like xml:base is included in discussion, why is
there no discussion of international domain names?  XML should not
constrain the domain names to be any character (the RFCs currently keep
the door open): whatever mechanism is eventually used to allow
internationalised domain names, that should be handled transparently by
the XML processor and the user should be able to see and type the direct
characters (or have NCRS).
So mandating %HH has the problem that we will have to revisit this RFC
as soon as international DNS comes online (which will probably be sooner
rather than later: there is a staggering demand here in Asia for it). 
It would be best if XML kept out of the issue entirely: in particular,
if it is decided that CNRP should be used to convert IDNS names into
ASCII domain names, then that is definitely something that would make
the approach of %HH in the domain name part unneeded: why should we have
one rule in the domain name and another rule for other places. 


Good to see:

1) |xml suffix is great idea

2) MIME types for DTDs and external parsed entities


I regret to say, I think these flaws are so great that the draft should
be withdrawn and retought at once. Especially point 5 is a disaster. In
particular, whenever there is some conversion between IETF syntax
requirements and simple plain text editing, this should be hidden from
the user and taken care of by the XML processor.

The current draft is a step backwards for internationalization of the
WWW in practise.  Or, at least, it makes life simpler for ASCII users
but much more difficult for us non-ASCII users.  And I think it makes
life much more complicated for implementers: it means that user
interfaces will have to have data conversion routines built in, rather
than just leaving it to the URI referencing library routines.


Rick Jelliffe

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

Follow-Ups:
- Re: FYI: Announcement of a new I-D for XML media types
  - From: richard@cogsci.ed.ac.uk (Richard Tobin)
- Re: FYI: Announcement of a new I-D for XML media types
  - From: John Cowan <jcowan@reutershealth.com>
- Re: FYI: Announcement of a new I-D for XML media types
  - From: Matt Sergeant <matt@sergeant.org>
- Re: FYI: Announcement of a new I-D for XML media types
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

References:
- FYI: Announcement of a new I-D for XML media types
  - From: Murata Makoto <mura034@attglobal.net>

Prev by Date: RE: Spreadsheet Markup Language?
Next by Date: Re: How can I put entity refs back into a document?
Previous by thread: FYI: Announcement of a new I-D for XML media types
Next by thread: Re: FYI: Announcement of a new I-D for XML media types
Index(es):
- Date
- Thread