Lists Home |
Date Index |
> -----Original Message-----
> From: Elliotte Harold [mailto:firstname.lastname@example.org]
> Sent: Wednesday, October 27, 2004 4:12 AM
> To: email@example.com
> Subject: Re: [xml-dev] Partyin' like it's 1999
> Bullard, Claude L (Len) wrote:
> > Most commenters I've read believe an overhaul
> > is needed. You should read Derek Denny-Brown's
> > blog. Dare referenced it and sys-con picked it
> > up for an article.
> > http://nothing-more.blogspot.com/2004/10/where-xml-goes-astray.html
> This article is absolute crap, and a typical example of Microsoft
> It blames XML for the very problems Microsoft created and which don't
> exist in other tools and on other platforms.
Every issue I brought up is independent of the MS platform. Microsoft
may have helped push XML into places it wasn't ready to go, but I think
it would have ended up there anyway.
> It's really scary to see that someone who's apparently been in charge
> Microsoft's XML efforts for seven years still doesn't get the
> between children and child elements, and hasn't noticed there's a
> that DOM has getChildNodes() and getElementsByTagName().
I may not have been clear, but the point of my writeup was to articulate
the problems that my customers experience. I like to believe I know how
to write a decent XML app, but I often see programmers new to XML using
getChildNodes() when I wish they were using getElementsByTagName() or
selectSingleNode(). My arguments, are not about what the XML experts
do, but rather about what problems the non-experts encounter.
> If Microsoft-platform developers are confused about the significance
> white space, it's because Microsoft deliberately deviated from the
> specifications when writing their implementations, not because there's
> anything fundamentally confusing about white space.
Or possibly because we found that lots of our customers were confused?
MSXML's DOM has the option to preserve whitespace, but we found that
most of our users preferred that option off by default.
> As to the issue of allowed characters, all I can say Microsoft
> developers must not be as smart as I thought. Xerces checks this in
> O(log N). XOM manages it in O(1).
MSXML and System.Xml are both O(1), but the issue is the constant
factor. We could probably parse 5% faster if the rules were simpler. (I
haven't done perf investigation into this in years... but that number is
under what I found when I last looked at it). Keeping code/data in the
L1/L2 cache is becoming increasingly critical for performance, and
aspects of XML such as this that make that difficult are part of why I
regularly have to struggle with perf conscious developers wanting to
write their own parsers, which skip these checks.
> The one thing that's mildly interesting in his article is the claim
> the Unicode 2.0 character repertoire has caused real problems for some
> Asian Microsoft customers. I've heard this claim before, but I've
> been able to verify it. If he's got real data, I'd like to see it.
> missing characters were needed? And were they really needed in markup
> and not text? (This is a very common source of confusion.)
I can say that I have had customers encounter this, and that it was
enough of an issue that it made it into XML 1.1. Unfortunately, I a few
quick google searches are not turning up any good examples of published