OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Four fine text-based data formats ... liberate yourself from one (silo) data format

I've been in the data management business for a Very Long Time.  I 
might not have "seen it all", but I've certainly seen a lot of it.

One thing that I've seen over and over and over is that with every 
new data management technology comes a whole group of people who 
insist that it solves the entire problem set -- and does so without 
any metadata at all!

And, I've seen over and over and over that significant projects that 
have no metadata associated with their data really do not understand 
their data very well at all.  The absence of metadata merely allows 
them to make more and more serious mistakes that get them deeper and 
deeper into trouble.  Those projects often fail, and even those that 
do not tend to be difficult (at best) to maintain.

When I first started talking to JSON people, the key point that every 
single one of them told me was that "JSON doesn't have, or need, any 
schema at all!"  In fact, I was told repeatedly, JSON is infinitely 
adaptable to whatever data you discover you want to include and you 
don't have to change anything else, because "it's all self-documenting".

In my considered opinion derived from considerable experience, the 
greatest value of schemata is that it forces one to actually think 
about their data in a truly analytic manner.  (An old Dilbert comic 
comes to mind, the one in which the pointy-haired boss tells his 
staff that he's going upstairs to find out the project's 
requirements, but they should start coding until he returns.)  Now, 
of course, I fully understand that requirements evolve as a project 
progresses, that a need for new, different, and less data is often 
discovered (sometimes several times during a project), and that the 
understanding of the data has to be updated in people's minds, in the 
documentation, etc.  But writing a schema in some formal way (whether 
SQL, XML, or anything else) is an extremely good exercise to help 
understand the data very clearly and unambiguously.  It helps get 
everybody on the same page.

The second most valuable thing I've found in schemata is that it 
keeps everybody -- producers and consumers of data alike -- 
honest.  Once a schema has been developed, and if it is actually used 
for validating data, then producers can't slip in extra data, leave 
out required data, etc. without the consumers becoming instantly 
aware of that fact.  Additionally, consumers of data can't complain 
to the producers that the data they're providing isn't complete or 
doesn't have all the right characteristics.  In a real sense, it 
forms a contract among parties involved with specific data.  And that 
is worth a fortune in both development time and in legal bills after 
a project goes into customer production.

In my various jobs, I deal with some extraordinarily complex 
schemas.  I can't tell you the number of times I've written something 
like <customerNumber> when I needed to write <CustomerNumber>, or 
<customerNum> or <custNum> or some such variation; or when I've 
generated data with subelements that should have been 
attributes.  The result, of course, is that processing the data with 
a tool such as XSLT or XQuery just gets it wrong, often 
silently.  I'm sure you've all encountered similar problems.  It is 
especially problematic when one is integrating several parts of a 
large project developed by different subgroups.

As clumsy and cumbersome as XML Schema tends to be, and as crude as 
DTDs can be, I have personally found that they save me vastly more 
time in the long run than they take to create, debug, and use for 
(machine) validation of data.

Hope this helps,

At 3/25/2013 05:24 PM, Michael Kay wrote:
>(Liam)>So when you rail against XML and say how much more 
>comfortable you are with JSON, it sounds to me as if you're really 
>railing against those shoe-wearing bureaucratic civil engineers who 
>wanted first-this-then-that. I agree that's a problem, although if 
>several different organizations are involved it's often necessary 
>politically if not technically, and the schema there can help (as 
>David and others suggest) as a sort of documentation.
>Actually I would contend that with XML it's much easier to handle 
>data with a loose or evolving schema than it is with JSON. For 
>example, changing a name-and-address format to allow multiple middle 
>names or multiple phone numbers with different roles is more likely 
>to be do-able without breaking existing applications in XML than it is in JSON.
>This of course is assuming you are processing the XML with 
>appropriate languages rather than with a language optimized to handle JSON...
>In fact, one could argue that many of the difficulties that arise 
>when processing XML arise precisely because of this flexibility 
>that's designed into the notation. If you want to make XML 
>processing easier in conventional languages, for example with Java 
>data binding, the first thing you do is to lock down the schema.
>Michael Kay
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
   Chair, ISO/IEC JTC1/SC32 and W3C XML Query WG    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Alternate email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA  Personal email: SheltieJim at xmission dot com
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS