On Jun 3, 2013, at 9:54 PM, "Uche Ogbuji" <uche@ogbuji.net> wrote:
On Mon, Jun 3, 2013 at 6:22 PM, David Lee <dlee@calldei.com> wrote:
>>
answer the above you are prety much guaranteed to pick an inappropriate tool.
>
Well, most of the above doesn't seem relevant to answer in the general case, so that seems a very strange conclusion to me.
-------
Considering you came to the same conclusion as I did with entirely different assumptions (i.e. that uneducated users tend to pick the most complicated, or as
I say, inappropriate, tool ... I find it strange that you find it strange.
Interestingly, I find it amusing at the least that I have never heard of, or am very ignorant about the top tools you suggest !!!
I find it amusing that you feel so sure you should have heard of them.
And then in the same sentence of recommending things like python and LINQ refer to quirks and issues around Unicode ...
You seem a bit confused in reading what I wrote. I mentioned quirks specifically in ElementTree, which is one of many Python libraries. To be specific, ET decides whether to encode a given node as UTF-8 or UTF-16 as an optimization (preserving
the encoding info). I think that's a lousy deign choice, but in XML terms, it's not incorrect, nor does it affect most developers for most use cases.
I find it telling that you didn't also mention ET's quirks re: mixed content, but I'll mention that either. ET takes a linked-list-ish approach to presenting child content. I find that awkward for dealing with mixed content, but again it's not
incorrect, and it's still a hell of a lot easier than processing mixed content in JAXP or any other Java library I've seen.
That scares me !
Then you should be scared of using most languages, including Java, which have Unicode quirks built in at an even deeper language.
Python 3.3 probably has the best Unicode implementation of any language, at this point.
So sure, there is plenty to scare you. See my earlier point about not ascribing to XML considerations that are far more fundamental.
Most junior to mid-level programmers I know don't understand what Unicode is enough to know if something is a quirk or not ...
Heck, most high-end programmers don't, but again, bytes (or rather characters) long before XML. Heck go look how many Web pages out there have Unicode quirks (even if you do set aside alternate encoding systems, it's a frighteningly high number).
But I do agree this is not a unique problem to XML ... but it is exacerbated by the relative obscurity of XML compared to the Top Ten things
students are taught at school ... whatever those are nowadays, I dont think XML is in it , and if it is, its taught badly.
Which is *precisely* my point.
.. .and in any case I entirely disagree that one can make a meaningful choice about XML tooling without knowing a lot of very specialist knowledge about XML.
I submit for your pondering ... that you know way too much about XML to the point of not realizing how daunting it appears for those who dont.
That's possible, but I return your submission with an annotation that I've done more than my fair share of training developers on XML. I know what we're up against.