OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Note from the Troll

[ Lists Home | Date Index | Thread Index ]

Thanks for clarifying.  I hope you don't mind if I respond to some of
the points.  I'm in agreement with many of them, but not with the
overall conclusion that XML has no value.

On Sun, 2002-10-27 at 06:49, tblanchard@mac.com wrote:
> Lately, I'm working for a company that is exchanging HR information 
> with job boards (like monster and hot jobs) - which has its own working 
> groups trying to define HRXML.

That's an interesting problem space, especially for someone strongly
convinced of the value of the relational model.  In the past ten to
fifteen years, in my experience, many of the largest firms moved
personnel information out of databases and into LDAP.  LDAP is all sorts
of things, but it isn't very relational.  It's very easy to model as a
hierarchical database.  Or as XML.

On the other hand, LDAP has some significant limitations.  It isn't very
relational, for one.  *laugh*  The problem, in the HR space, is that
information fits neatly into hierarchies, except when it doesn't.  And
relations do a nice job, except for the rigor of column definitions.  So
that area, in particular, is a hard problem, one that probably requires
a synergy of technologies.  For all the hype surrounding the XMLization
of the current leading relational database products, we aren't there
yet.

> 1) XML Tools suck - they're little more than syntax coloring editors.

Hmm.  My favorites are, but I work for a company that has produced
strongly graphical editors.  Available on a Mac, no less (the graphics
used to describe various schemata have a tendency to appear in a variety
of books; we pass them around at work, when found).

On the whole, I tend to agree that tools aren't up to par as yet.  But
... different people need to do different things.  Cue rant from ERH on
the uselessness of tree view as an editing model.

> 2) The Hype is at the same level as the hype was for AI and it can't 
> possibly live up to it.  It should be written <genuflect>XML</genuflect>

All too true.  Are the hypesters identical to the developers?  I think
that that was true for the AI model.  I'm doubtful that it is the case
for XML.  THe most outrageous claims seem to be made by PHBs.

> 3) The weight of the processing model is really really heavy.  As an 
> example, using URLs to reference DTD's causes all sorts of problems for 
> computers when they're off the network.  XML  parsing simply halts.  
> This is especially annoying when running something like BEA WebLogic on 
> your machine because you're doing a web app.  BEA stores config info in 
> XML which references some DTD at BEA and the server simply won't start 
> if that server isn't available.  You can argue this is a misuse of XML 
> - I think so - but its one of those things thats going to hurt people's 
> impressions of XML.

Hmm.  I think that much more can be said here, and that some of it can
tie in with points 5 and 6 below.  It no longer surprises me that W3C
recs tend to show the adverse effects of prolonged URI abuse.  One of
the canons is that URIs are the perfect addressing mechanism.  No, wait,
the perfect identification mechanism.  No, wait.  Oh, and sometimes URIs
are not URIs.  Massive confusion is created over whether the use of
URIs, in a particular context, is for identification, for comparison,
for location, or for decoupling.

In fact, I would come to the same conclusion here, but argue that the
problem is an *inadequate* processing model, not one that's too
heavyweight.  It isn't, on the whole, clear when you should retrieve a
URI, and when you ought to compare it to something else by the rules
governing URIs, and when you ought to compare it as a string, ignoring
the fact that it has the form of a URI.

Add linking (the XML version of relations, if you will), and life
becomes most unpleasant.  See the discussion between ERH, SSL, and UU on
order of processing of XInclude during XPath processing.  Eric van der
Vlist has proposed a model for specifying processing order, to address
the issue.

> 4) Schema is really an insane spec.  I mentioned just the data types - 
> too many too complicated.  Do we have to specify the number of bytes 
> for the ints?  Thats a physical issue and for this Smalltalker it 
> doesn't even make sense (Smalltalk handles arbitrary sized numbers).

Umm.  See [me] for rants on schema type cluelessness.  Argh.  In fact, I
don't much care about the existence of the various register-based
integers.  They're among the twenty-five derived base types.  The
derivations even make a certain amount of sense.  On the other hand,
there are *nineteen* primitive types.  That's string, boolean, number
... uh, another number.  And oh, yeah, another number.  Only different. 
And, umm, date, time, dateTime, and duration.  Oh, and another five or
so things that have to do with time or duration.  Not related to one
another.  Oh, and remember we were talking about how URIs are sometimes
just strings?  Well, in schema, they *aren't* strings.  No, I said I
wouldn't go on this rant.  Sorry.  This is a place of deep flaw, that
needs very serious, very careful work.  What's worse is that the spec
was so delayed, and so anticipated, that most folks really, really want
to overlook its enormous hairy dangling warts and just get on with the
job.

> 5) Like C++, the average developer can't cope with the excessive 
> complexity XML introduces relative to its value.  The average 
> programmer doesn't really know the difference between nonnegative int 
> and positive int.  In fact, the schemas I'm getting from biz partners 
> (the couple that want to use XML because everyone is using it - and its 
> less than half) are AWFUL.

Ugh.  There is a solution to this problem; it's called RELAX NG.  But it
isn't a solution to the problem of primitive types, which hasn't been
addressed, as yet.

> XSLT files are maintainable with the same level of ease as densely 
> written perl.  Developers asked to modify them routinely rewrite them 
> because they can't figure out what the last guy was doing.

Hmm.  I haven't encountered that one.  I think some of the more
web-oriented here have reported similar things, though (my work doesn't
bring me into contact with that segment).  So the transform sets that
I've seen in use tend to be much more maintainable.

Come to think of it, though, it wouldn't surprise me to find that
quickie XSLT transforms would share the ease of confusion of quickie
perl hacks.  Similar problem spaces; perl addresses text, XSLT XML.

> loved it because the thing itself is the damnedest puzzle.  It 
> entertained and challenged their intellect to work with it.  I'm 
> beginning to suspect the same about XML.

*laugh*  I, for one, *hate* having the nasty little corners crop up.  I
end up having to explain (for instance) that attributes are *not* in the
default namespace when unadorned, even though similarly unadorned
elements are.  I'm not looking forward to having to explain why 0x0D is
in the whitespace production, even though it can't appear in an XML
stream ... I'll have to refer folks to the post describing that
particular Stupid Entity Trick, because I can't remember it.  And when
they ask why any spec would support something so extremely obscure, I'll
just have to shrug.  Smiling and wincing.

> 6) From the stand point of business process and enterprise architecture 
> - XML is an evolutionary step backwards.  Hierarchical databases were 
> abandoned for relational models long ago and systems made out of lots 

Err.  There's been one objection to this already.  I can add that there
are a number of spaces where hierarchical organization is taking share
from relational models (LDAP is one).  For long term storage of
information, XML actually does make sense, because it's easy to
untangle, in comparison to a number of alternatives.  Note that it is
not necessarily better than plists, or S-expressions, except that it's
in wider use, and that means that the knowledge of how to tease
information out has a better chance of remaining over the long term.

This is important for governments, for instance.  Last I heard, the 1960
US census is stored in a format that can be read by only two working
machines in the world (one of which is on display at the Smithsonian). 
Having trained as an historian in the long ago, I can say that that is
an unmitigated disaster.  Cost of transformation, if undertaken, will be
enormous.  If not undertaken, loss of information will be enormous.

Folks (including, from an apocryphal source, the IRS) store proprietary
formats into databases as BLOBs.  Better if that's XML.

XML's looser restrictions provide an anodyne to the limitations of
databases, and one that ought to, at some point, be taken up as a
synthesis.  Common databases do far better with primitive types, but
have far less ability to handle semi-structured information.  This, in
fact, is probably the HR problem.  There's personnel information that
ought to be loosely structured, the sorts of things that are well
described by DTD and RELAX NG without types.  There's other information
that has strong typing, and there's a real need to get at things fast,
to be able to index and look up in a variety of ways, and to avoid
redundancy of information.  Somewhere in a synthesis of hierarchy and
relation may lie the answer.  It isn't currently available, though, to
the best of my knowledge.

> I need to write the follow on to this piece but it will focus on point 
> 6 above plus the assertion that XML fails as both a markup language 
> (markup shouldn't require well-formedness) and as a serialization 

*ABSOLUTELY* disagreed.

Well-formedness is the thing that makes XML worthwhile.  SGML, intended
for writing in text editors, has all sorts of cute little tricks called
"markup minimization".  You can end a tag with </>.  If SGML had
primitive types for elements, you'd see minimization like <price/2.99/. 
And other Stupid Tag Minimization Tricks.  These are great for typing,
until you reach the point in the document that looks like this:
</></></></></>
Hope you don't have to extend one of those sections.  Whatever they
are.  Still worse if you have full minimization, where the end tag is
merely implied.  Does this element imply the end of the last three open
tags?  Or just of the last two?

Well-formedness removes ambiguity.  It does so at the expense of
terseness.  It's a good choice for a markup language.  It's a poor
choice for a data exchange format, where all the data is in very tiny
chunks, and the ambiguity does not arise.

> format (too hierarchically oriented, verbose, and weirdly structured 
> relative to ER models).

*shrug*  ER models have their limitations as well.  More loosely
structured data than tables can easily model are commonly encountered. 
It's a good place to use XML.

> 7) While there is lots of heavyweight support for reading XML, there 
> isn't any help for writing it from various other data structures.

Hmmm.  I kinda need to think about that one, I guess.  We've been having
go-rounds on the subject of "serialization", at work.

> So there's my viewpoint.  Take it for what its worth.  Am I trolling?  

Well, I thought so.  Even I can claim to have "used" DTDs for eight
years, since I was writing HTML with doctype decls at the top in '94. 
Clearly, I was wrong to think so, but it was hard for me to see what the
needs were, behind the very strongly expressed antagonism.

> Because it runs against what I've found to be true in my own work.  
> Centralize business logic and ER/OO modeling to model your business 
> entities and processes.  This works well when the company has the 
> discipline to pick an implementation language and stick with it and 
> focuses all developers on this goal.

I was a grunt programmer at some of those companies.  Speaking from the
trenches (RPC, DCOM, CORBA), I don't think those models work.  In fact,
I think that they're seriously flawed by adopting Sun's marketing
slogan.  The network is not a computer.

> But of course, XML philosophy says the opposite.  Where is the business 
> rule repository in XML?

Where do you want it to be?  XML is data, not objects.  It isn't even
very good at linking, just at the moment, even though it was always
*supposed* to be.

> Have you considered that you're breathing your own exhaust?  You don't 

*laugh*  Such an image.  No, I'm being a hairy nuisance to various
committees, trying to convince them *not* to make any more nasty little
complexities that I'll end up having to explain.  Unconvincingly, since
there really isn't any excuse for monstrosities like gHorribleKludge.

However, I think that there are some underlying principles that are very
valuable, and that address a number of problems hard to deal with using
other tools.  You may not agree, if you see no value in
well-formedness.  Alaric doesn't agree, 'cause he doesn't see value in
text-only formats.  XML well-formedness and text-only format are very
important for easing exchange and encouraging long-term preservation of
information.  The looser-than-database-columns structure of schemas (not
meaning WXS, here, but DTD and RNG, primarily) provides a useful match
for many kinds of information that fit awkwardly into relational models
(this information often ends up as unindexable blobs).  Localization of
information is important when information is transient; XML documents
are often better solutions for the delivery of information than are
rowsets (particularly when the information may be loosely structured, as
noted in the previous point).  As a format for exchange of strongly
typed data, it's currently a mess and a failure (in my opinion), because
there is no type system (what's in WXS isn't a system).

Amy!
-- 
Amelia A. Lewis       amyzing@talsever.com      alicorn@mindspring.com
To be whole is to be part; true voyage is return.
                -- Laia Asieo Odo (Ursula K. LeGuin, "The Dispossessed")




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS