OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Re: XML As Fall Guy

There's a lot of good content in this thread, and I think its important to continue, because it susses out one of the other aspects of such big waterfall productions (even when the shop involved is ostensibly using scrum :-).

"Unfortunate things happened."

In corporate speak, this is the closest you'll ever see to an admission of guilt. A simple assertion that errors occurred, without any subsequent explanation about who was responsible, what those errors were, why they were made, why they weren't fixed and so forth. Admission of liability means that you assume that liability, which in turn means that your shareholders don't get their dividend checks this month and they stop investing in your company. That's also human nature, but one consequence of it is that people never learn from the mistakes.

As I said earlier, I have to be very circumspect myself in what I'm saying right now, as this has the potential to get into legal territory, so if it seems like I'm speaking in generalities, its because I am trying to pick my words carefully.

Building a common data model framework at the outset is critical. The details of the individual components don't need to be worked out at this point, but understanding the relationships between components is a necessary first step before you can even begin architecting a solution, let alone start writing code. What's more, agreeing on how this information is going to get expressed, what set of rules you're going to use to determine the characteristics of that data model, is even more important. If one part of an organization is using NIEM, another using EDIFAC standards, a third just making up models on the fly (using schemas that they found doing Google searches on the web), then you will have trouble when these silos have to communicate, because the framework of how you establish the models can dictate how the model is put together (and what you can express).

Agreement on global standards also apply to resource identification. This was not a complex project, but it included some critical wrinkles - it is distributed, meaning that you cannot count on internal identifiers generated by one database being uniform across the whole system. It involved creating proxies - I want to shop for insurance that fits my price range and eligibility requirements, but I do not want the government, my employer, my neighbor to know who I am until I have committed to a plan (surprising how big a factor this one was). You have to identify plans and policies and providers and agencies and exchanges and children and employers and ... indeed, this was largely a resource mangement problem. Treat each of these as separate binaries working in their own cogs and you will run into problems when two different vendors implement these classes in different ways then fail to communicate those difference in behaviors to other embers of that exchange. This means that you have to keep your code modeled in a dynamic system and your resources need to be treated abstractly. Good XSLT developers know this problem intrinsically, and also understand that unless you have a consistent global identification mechanism, you have no context on which to build.

Information is never complete, and this means that you have to persist resources over time. Income needs to be verified with the IRS, immigration status needs to go to DHS, eligibility needs to be determined. You have to know whether a person already applied for a program and was rejected, need to know when that person becomes eligible for programs they weren't eligible for before and so forth, and this means that over time you end up building a painting of that person in your data system. You can hide certain information, of course, and for privacy reasons you have to, but that makes the ability to maintain some handle to that person and persistence on that person more critical, not less. Building a messaging system without understanding that those messages basically are changing the state of knowledge about a given person, plan or requirement is similarly critically questionable. It's much like building railroad tracks and a network without worrying about little things like gauge widths or whether you're transporting shipping containers vs. nuclear waste.

Understanding the distinction between data and application layers (and what those things really mean) is also important. I think one reason that programmers (especially Java programmers) have so much problem with MarkLogic (or eXist, or increasingly a lot of NoSQL systems) is that there has been a steady shift in thinking over the last twenty years away from the notion that databases only hold content to the notion that there are certain tasks that can be better accomplished by a dedicated application layer sitting above the actual data storage system than can be done by ad-hoc programming processes outside the system. Data validation, transformations, content processing pipelines, rules based action systems, reporting, auditing and versioning, notification, translation, packaging and depackaging, inferencing, content enrichment, serving web application content, and so forth ... all of these things are implemented in MarkLogic, and are increasingly becoming "standard" features for NoSQL systems in particular. 

These are "application layer" things in the sense that they are not actually involved with the database activities of queries or indexing, but because they are within a "database" server, there is a tendency among Java developers to want to ignore this facility and "roll" their own, because everyone knows that Java is better at all these things (not taking into account the fact that any Java process has to automatically add the expensive tasks of data serialization and parsing via a JAXB representation into objects that are heavily requirement upon an extant schema that often is not written with object serialization in mind and that may change daily in development realities.

Indeed, the worst possible way to use MarkLogic or most NoSQL data systems is unfortunately the way that most developers use them - for storing and retrieving static XML. If you never query your XML, if you never need to transform your XML, if you never need to worry about versioning or archiving or validating it, then you're better off just storing the XML in a file system. Of course, this puts the onus of doing all of this on you as the developer, and after a while you get complex rube goldberg like systems that spring up because there are always exceptions and because you treat this XML different from that XML (and you treat it differently than the developer in the next cubicle). You place a high reliance upon schemas, despite the fact that schema creation is itself a remarkably fine art, the number of people working with XSD 1.1 and constraint modeling is still vanishingly small, and most effective schemas are polyphasic - validation changes based upon workflow context.

By the way, I would say that the same thing applies to JSON. JSON has a different core data model, but from a processing standpoint, the value of a JSON database comes in its ability to query, validate and transform its content. MarkLogic is a pretty decent JSON store as well, by the way, though I'd like to see a CoffeeScript layer built into MarkLogic at some point just so that its more accessible to _javascript_ developers. 

If you then add into this the fact that when you take the above pieces out of the mix of Java development requirements, you remove a lot of the real need for Java developers in the first place. When your business model is predicated on employing large number of developers that can be charged at inflated rates because it is a time and material government contract that you believe will not be heavily scutinized because it advances an administration priority, MarkLogic can be seen as a real threat.

Now, I'm a consultant, but I believe that you create value (and get more work) by delivering value quickly and thoroughly, something that I believe my consultancy generally follows as well. It's why I look for tools that allow me to provide that value, so that I can concentrate on the harder problems of developing cohesive information strategies for my clients. MarkLogic for me is one of those tools, and for all that I can be occasionally critical of specific decisions MarkLogic makes, I still would recommend them unreservedly because they have solved so many of the problems that frankly are data-centric or data-stream-centric, including a lot of the system integration problems that seem endemic to every organization I've worked with.

I think on this list in particular there is a tendency to ask - Was XML at fault in a project like this? If you do not understand data modeling or data design, effective requirement requirements, if you don't understand XML or data interchange or NoSQL databases, and if you have a vested interest in not knowing these things, then yes, XML was at fault, though by that same logic Java was at fault.


Kurt Cagle
Invited Expert, XForms Working Group, W3C
Managing Editor, XMLToday.org

On Wed, Nov 27, 2013 at 10:28 AM, <cbullard@hiwaay.net> wrote:
I look for low hanging fruit.  For example, we see this big IT project where they "say" they want to integrate systems.  What is the n of integration?  IOW, start with existing forms.  Do they really mean to put all of the data these
capture into one honking relational system OR do they want to capture the same data and put it somewhere.

1.  They really want the big bad relational system.  Big dollars.  Much business rule capture, much sorting and crunching down the names to fields and data types, much discovery of when is B = A OR B == A, much how do I decide when a name is a person or a person is a name, etc.  We know the drills.  Charge X.

2.  The only want to get this data into the same place and get rid of the paper.  AHA!  Write XML docs that replicate forms, output PDF, tell PDF it is a form and let Acrobat Pro do its voodoo that it does so well, clean up the names and check the XML export (just in case I need it but I might not), write a web page to download forms/PDFs, write a web page to upload forms.  Write a few more pages for finding forms.  (Never said a word about validating did they? Good.   If they do, I'll renegotiate and start working on the names in those data bags they've been collecting.)

IOW, if you don't have to do it all upfront, don't.  Plan to do some of it later and make sure it can be done.   Some people don't want to replace paper; they want to quit killing as many trees.  They don't want to reengineer all of their processes.  They want to quit carrying paper back and forth from a flight line and stuffing in a desk and trying to find it later.  They have these neat little tablets but they don't need them to think for them.  They just don't want to scribble.

The 99% problem is a problem of requirements.  The better you are at this the better you can talk to the folk who don't understand the details by talking to them about what they do know and then making damm sure you can turn around and tell the folks in the basement precisely what to build.

And if the customer begins to play wrong rock right rock, take a contract lawyer to the meeting.


Quoting Rick Jelliffe <rjelliffe@allette.com.au>:

The enquiry into the Queensland debacle, where a $4 million project
spiralled into maybe a $400 million mess, found that apart from
personalities and capabilities and execution, the problem was the "shared
services" myth.
This says that if you have a few dozen disparate systems doing much the
same thing, you should centralize. The trouble being, in reality, that if
the disparate system all represent variant functionality, with not much in
common, your customization effort can be much bigger than your platform
You still have to find and replicate all those differences: to me it is a
classic case where a little bit of waterfall would be prudent: reverse
engineer the current system before you design the next! At least then you
know where you are up to...
The government "shared services" efforts here in Australia have all been
pulled back: the premise so often being wrong that the failure was
As Gareth says, the US problem sounds similar: the problem being to cope
with multiplicity not commonality, if I can paraphrase Kurt's early post.

On 27/11/2013 4:32 PM, "Gareth Oakes" <goakes@gpslsolutions.com> wrote:

> On 27/11/2013 7:37 am, "cbullard@hiwaay.net" <cbullard@hiwaay.net>
> Simon sez: "Whatever the underlying story, I suspect we'll be dealing
> with the reverberations from this for a while."

Agree with this. The one thing for sure is that MarkLogic now has a big
headache and a lot of damage control to worry about :)

I think you'll find that now the spotlight is on those responsible for the
Healthcare.gov project, they'll be grasping at any straw they can to spread
the blame around.  I'll bet that in the application they are using it for,
MarkLogic is a sound technology choice.

My conclusion: this smacks of a typical "big business" technology
We had an analogous problem over here in Queensland, leaving a $1.2B hole
(search: Queensland Health payroll disaster). The somewhat amusing upshot
being that IBM is currently banned from doing new business with the

I wonder how much better these types of projects could go if a more
incremental (Agile) approach was taken. The beauty of XML is that if you
the systems up right it can give you amazing flexibility and power

Gareth Oakes
Chief Architect, GPSL


XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS