Lists Home |
Date Index |
On Sunday, October 27, 2002, at 05:15 PM, Amelia A Lewis wrote:
> That's an interesting problem space, especially for someone strongly
> convinced of the value of the relational model. In the past ten to
> fifteen years, in my experience, many of the largest firms moved
> personnel information out of databases and into LDAP. LDAP is all
> of things, but it isn't very relational. It's very easy to model as a
> hierarchical database. Or as XML.
LDAP (Lightweight Directory ACCESS PROTOCOL) somewhere went from
unified view and api on existing datastores to being its own datastore.
That was a mistake. I really expected to see some techniques for
writing adaptors that sat atop views in oracle.
Besides which, LDAP focused on a completely different problem. LDAP is
all about identifying who you are - not who you were or who you might
become next year or your capability profile or your salary history or
how much money you owe who.
> All too true. Are the hypesters identical to the developers? I think
> that that was true for the AI model. I'm doubtful that it is the case
> for XML. THe most outrageous claims seem to be made by PHBs.
The hypesters are (in no particular order)
1) WebServices tool vendors (of which - there remains persistent hype
with no definition - I expect this will eventually burn itself out just
like ecommerce running on cell phones).
2) WebApp server vendors
3) XML Tool vendors
4) Clueless industry pundits (which is almost all of them)
5) JHeads. (Java programmers that know little else).
> Massive confusion is created over whether the use of
> URIs, in a particular context, is for identification, for comparison,
> for location, or for decoupling.
Thats part of it. Lets put it another way though - there are too many
little mechanisms overall. Writing an XML parser is a daunting task -
which seems astonishing to me considering its just meant to be
structured data. Look at the size of the libraries in the Xerces
parser. I mean, come on. You ought to be able to write a fully
featured parser in a couple hundred lines of code.
> What's worse is that the spec
> was so delayed, and so anticipated, that most folks really, really want
> to overlook its enormous hairy dangling warts and just get on with the
I'd say it ought to be wholesale scrapped and re-done. But of course
thats sort of happened with DTDs and if this continues credibility is
going to dive.
> Ugh. There is a solution to this problem; it's called RELAX NG. But
> isn't a solution to the problem of primitive types, which hasn't been
> addressed, as yet.
OK, I've just read this (I wasn't aware there was another group taking
a try at it).
It still seems too complicated and credibility is definitely diving.
When I write something to parse a chunk of XML and I want to enforce
some structure, I typically create a dictionary of element names as
keys and a list of valid containers as values. That works for nesting
enforcement. An additional bit of info for cardinality and perhaps a
type marker are all you really need. Anything more complicated than
that and - well let them write code.
> Last I heard, the 1960
> US census is stored in a format that can be read by only two working
> machines in the world (one of which is on display at the Smithsonian).
> Having trained as an historian in the long ago, I can say that that is
> an unmitigated disaster. Cost of transformation, if undertaken, will
> enormous. If not undertaken, loss of information will be enormous.
Yeah but this is a hardware media problem more than a software problem
I have a bunch of stuff on Syquest drives too - wonder how I'll get
> This, in
> fact, is probably the HR problem. .... Somewhere in a synthesis of
> hierarchy and
> relation may lie the answer. It isn't currently available, though, to
> the best of my knowledge.
The current solution is to use a free form text indexer like verity,
autonomy, or the google appliance to handle resumes and other
documents, and relational db for structured info. Text indexers based
on interesting fuzzy match and bayesian techniques are rapidly reducing
the requirement for markup in document management I think. Google is
an excellent example (and now you can get it in a box).