xml-dev - Re: [xml-dev] Note from the Troll

Re: [xml-dev] Note from the Troll

[ Lists Home | Date Index | Thread Index ]

To: Amelia A Lewis <amyzing@talsever.com>
Subject: Re: [xml-dev] Note from the Troll
From: tblanchard@mac.com
Date: Mon, 28 Oct 2002 20:38:19 +0100
Cc: xml-dev@lists.xml.org
In-reply-to: <1035735331.18218.74.camel@marajen>

On Sunday, October 27, 2002, at 05:15  PM, Amelia A Lewis wrote:

> That's an interesting problem space, especially for someone strongly
> convinced of the value of the relational model.  In the past ten to
> fifteen years, in my experience, many of the largest firms moved
> personnel information out of databases and into LDAP.  LDAP is all 
> sorts
> of things, but it isn't very relational.  It's very easy to model as a
> hierarchical database.  Or as XML.

LDAP (Lightweight Directory ACCESS PROTOCOL) somewhere went from 
unified view and api on existing datastores to being its own datastore. 
  That was a mistake.  I really expected to see some techniques for 
writing adaptors that sat atop views in oracle.

Besides which, LDAP focused on a completely different problem.  LDAP is 
all about identifying who you are - not who you were or who you might 
become next year or your capability profile or your salary history or 
how much money you owe who.

> All too true.  Are the hypesters identical to the developers?  I think
> that that was true for the AI model.  I'm doubtful that it is the case
> for XML.  THe most outrageous claims seem to be made by PHBs.

The hypesters are (in no particular order)
1) WebServices tool vendors (of which - there remains persistent hype 
with no definition - I expect this will eventually burn itself out just 
like ecommerce running on cell phones).
2) WebApp server vendors
3) XML Tool vendors
4) Clueless industry pundits (which is almost all of them)
5) JHeads. (Java programmers that know little else).

>  Massive confusion is created over whether the use of
> URIs, in a particular context, is for identification, for comparison,
> for location, or for decoupling.

Thats part of it.  Lets put it another way though - there are too many 
little mechanisms overall.  Writing an XML parser is a daunting task - 
which seems astonishing to me considering its just meant to be 
structured data.  Look at the size of the libraries in the Xerces 
parser.  I mean, come on.  You ought to be able to write a fully 
featured parser in a couple hundred lines of code.

> What's worse is that the spec
> was so delayed, and so anticipated, that most folks really, really want
> to overlook its enormous hairy dangling warts and just get on with the
> job.

I'd say it ought to be wholesale scrapped and re-done.  But of course 
thats sort of happened with DTDs and if this continues credibility is 
going to dive.

> Ugh.  There is a solution to this problem; it's called RELAX NG.  But 
> it
> isn't a solution to the problem of primitive types, which hasn't been
> addressed, as yet.

OK, I've just read this (I wasn't aware there was another group taking 
a try at it).
It still seems too complicated and credibility is definitely diving.

When I write something to parse a chunk of XML and I want to enforce 
some structure, I typically create a dictionary of element names as 
keys and a list of valid containers as values.  That works for nesting 
enforcement.  An additional bit of info for cardinality and perhaps a 
type marker are all you really need.  Anything more complicated than 
that and - well let them write code.

> Last I heard, the 1960
> US census is stored in a format that can be read by only two working
> machines in the world (one of which is on display at the Smithsonian).
> Having trained as an historian in the long ago, I can say that that is
> an unmitigated disaster.  Cost of transformation, if undertaken, will 
> be
> enormous.  If not undertaken, loss of information will be enormous.

Yeah but this is a hardware media problem more than a software problem 
I'll wager.
I have a bunch of stuff on Syquest drives too - wonder how I'll get 
that transferred.

> This, in
> fact, is probably the HR problem.  .... Somewhere in a synthesis of 
> hierarchy and
> relation may lie the answer.  It isn't currently available, though, to
> the best of my knowledge.

The current solution is to use a free form text indexer like verity, 
autonomy, or the google appliance to handle resumes and other 
documents, and relational db for structured info.  Text indexers based 
on interesting fuzzy match and bayesian techniques are rapidly reducing 
the requirement for markup in document management I think.  Google is 
an excellent example (and now you can get it in a box).

Follow-Ups:
- Re: [xml-dev] Note from the Troll
  - From: Alaric Snell <alaric@alaric-snell.com>
- Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)
  - From: Mike Champion <mc@xegesis.org>

References:
- Re: [xml-dev] Note from the Troll
  - From: Amelia A Lewis <amyzing@talsever.com>

Prev by Date: Re: [xml-dev] XInclude: security risk 1
Next by Date: RE: [xml-dev] The XML 500 word Challenge
Previous by thread: Re: [xml-dev] Note from the Troll
Next by thread: Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)
Index(es):
- Date
- Thread