Lists Home |
Date Index |
- To: "Tim Bray" <email@example.com>,<firstname.lastname@example.org>
- Subject: RE: [xml-dev] Some comments on the 1.1 draft
- From: "Michael Rys" <email@example.com>
- Date: Tue, 18 Dec 2001 16:54:39 -0800
- Thread-index: AcGFBHkCnCNeI/NXR4KD7wEnl4RjmgDAkxlg
- Thread-topic: [xml-dev] Some comments on the 1.1 draft
Tim, with all due respect, but allowing #x0-@x1F inside element and
attribute content would tremedeously help users of XML that use non-XML
string sources for their data and map it into XML without loosing
fidelity and without having to base 64 encode otherwise normal strings.
Most of these applications do not care about the semantics of ETX or
EOM, but just that they are being preserved over the XML serialization.
Example applications are: SOAP, XML database serializations etc.
> -----Original Message-----
> From: Tim Bray [mailto:firstname.lastname@example.org]
> Sent: Friday, December 14, 2001 17:03 PM
> To: email@example.com
> Subject: [xml-dev] Some comments on the 1.1 draft
> I sent this to the public blueberry-coments address, but
> thought some of them might usefully be discussed here. If
> someone wants to start an argument about one or more of
> these, please pull it out and give it a separate subject
> 1. The principle of decoupling the XML spec from successive
> revisions of Unicode is the only sensible way forward.
> 2. If no consensus can be built around the details of this
> set of changes, it would be acceptable to declare defeat and
> go on with XML 1.0 2nd ed as-is. This would be a regrettable
> outcome but not fatal at a deep level.
> 3. Issue 18: The costs of allowing #x1-#x1F appear to me to
> exceed the benefits. Among other things, many of these
> ASCII control chars, despite being several decades old, have
> little consensus concerning their semantics, e.g. EOT and EOM
> (#x3 and #x4). I think from the XML point of view these things
> are actively pernicious; specifically the notion that semantics
> are embedded in characters rather than being expressed by markup.
> The case of "textual content that may contain such characters
> (but typically does not)" is pretty non-convincing. In *many*
> cases the occurrence of these characters is evidence of an error.
> 4. Issue 21: The cost of allowing null bytes in XML content is
> very high and the benefits hard to understand.
> 5. I strongly feel that #x85 (NEXT LINE) should not be added to
> the S production. The reason is a simple cost-benefit analysis;
> the proportion of computing installations where this is an issue
> is not large and is shrinking as a proportion of the
> infrastructure. Supporting this change imposes significant
> conversion costs on the rest of the world; the total global
> net cost would be significantly less if the mainframe software
> infrastructure took the necessary corrective measures to deal
> with XML 1.0 as specified.
> 6. I strongly feel, even more so than in the case of #x85,
> that #x2028 is inappropriate for inclusion in S. Here are
> some reasons:
> - If LINE SEPARATOR is to be included, why not the many
> other Unicode characters with spacing semantics? A
> coherent explanation needs to be provided on this
> point and I am unconvinced that one exists.
> - This would be the only core XML syntax character that
> can't fit in a byte. This would complicate several
> automaton-driven parser construction strategies. One
> of the key design goals of XML is to make programmers'
> lives simpler, so this objection should have weight.
> - "For completeness" is a really flimsy argument.
> 7. In , #x37a is included, which is a combining
> character and shouldn't be in NameStart
> 8. In , #xf7 is included (division sign), but the
> rest of the mathematical operators (starting at
> #x2200) are excluded.
> 9. The inclusion of a block #x202A-#218f is kind
> of puzzling... it starts in the middle of one of the
> punctuation blocks, and the first few chars seem
> really unsuitable. What's the intent... wanting to
> include the currency symbols? This definitely
> needs some explanation.
> 10. There are some problems in the #x2800-#xD7FF block.
> Do we really want CJK radicals (#x2e80...), compatibility
> Jamo, ideographic description chars, and so on?
> 11. SHould that block end at #xD7aF or #xD7FF?
> 12. [#xFDE0-#xFFEF] includes the private use area and lots
> of compatibility characters which XML 1.0 actually
> deprecates for use at all, let alone as names. This
> is astounding and needs some defense. If this is OK,
> why not throw in all the punctuation?
> 13. What's wrong with ASCII digits as name start chars, given
> that all sorts of other digits are going in?
> 14. There really needs to be some deep discussion in this
> document of why this alternative was chosen. When I
> look at some of the wildly unlikely things that are
> allowed to appear in names, the obvious question is:
> Why not rely on the Unicode properties database. In
> particular, this allows lots of Name characters that
> are not in fact Unicode characters at all and probably
> never will be.
> 15. Issue 11:
> I can see both sides of this question. My intuition is
> that the computational cost of doing this is unacceptably
> high for high-throughput applications of XML, but we need
> some research to establish if this is the case. If it can
> be done cheaply and compactly, it's probably a good idea.
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>