[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Four fine text-based data formats ... liberate yourself from one (silo) data format
- From: Jim Melton <jim.melton@oracle.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 26 Mar 2013 14:28:50 -0600
I've been in the data management business for a Very Long Time. I
might not have "seen it all", but I've certainly seen a lot of it.
One thing that I've seen over and over and over is that with every
new data management technology comes a whole group of people who
insist that it solves the entire problem set -- and does so without
any metadata at all!
And, I've seen over and over and over that significant projects that
have no metadata associated with their data really do not understand
their data very well at all. The absence of metadata merely allows
them to make more and more serious mistakes that get them deeper and
deeper into trouble. Those projects often fail, and even those that
do not tend to be difficult (at best) to maintain.
When I first started talking to JSON people, the key point that every
single one of them told me was that "JSON doesn't have, or need, any
schema at all!" In fact, I was told repeatedly, JSON is infinitely
adaptable to whatever data you discover you want to include and you
don't have to change anything else, because "it's all self-documenting".
In my considered opinion derived from considerable experience, the
greatest value of schemata is that it forces one to actually think
about their data in a truly analytic manner. (An old Dilbert comic
comes to mind, the one in which the pointy-haired boss tells his
staff that he's going upstairs to find out the project's
requirements, but they should start coding until he returns.) Now,
of course, I fully understand that requirements evolve as a project
progresses, that a need for new, different, and less data is often
discovered (sometimes several times during a project), and that the
understanding of the data has to be updated in people's minds, in the
documentation, etc. But writing a schema in some formal way (whether
SQL, XML, or anything else) is an extremely good exercise to help
understand the data very clearly and unambiguously. It helps get
everybody on the same page.
The second most valuable thing I've found in schemata is that it
keeps everybody -- producers and consumers of data alike --
honest. Once a schema has been developed, and if it is actually used
for validating data, then producers can't slip in extra data, leave
out required data, etc. without the consumers becoming instantly
aware of that fact. Additionally, consumers of data can't complain
to the producers that the data they're providing isn't complete or
doesn't have all the right characteristics. In a real sense, it
forms a contract among parties involved with specific data. And that
is worth a fortune in both development time and in legal bills after
a project goes into customer production.
In my various jobs, I deal with some extraordinarily complex
schemas. I can't tell you the number of times I've written something
like <customerNumber> when I needed to write <CustomerNumber>, or
<customerNum> or <custNum> or some such variation; or when I've
generated data with subelements that should have been
attributes. The result, of course, is that processing the data with
a tool such as XSLT or XQuery just gets it wrong, often
silently. I'm sure you've all encountered similar problems. It is
especially problematic when one is integrating several parts of a
large project developed by different subgroups.
As clumsy and cumbersome as XML Schema tends to be, and as crude as
DTDs can be, I have personally found that they save me vastly more
time in the long run than they take to create, debug, and use for
(machine) validation of data.
Hope this helps,
Jim
At 3/25/2013 05:24 PM, Michael Kay wrote:
>(Liam)>So when you rail against XML and say how much more
>comfortable you are with JSON, it sounds to me as if you're really
>railing against those shoe-wearing bureaucratic civil engineers who
>wanted first-this-then-that. I agree that's a problem, although if
>several different organizations are involved it's often necessary
>politically if not technically, and the schema there can help (as
>David and others suggest) as a sort of documentation.
>
>
>Actually I would contend that with XML it's much easier to handle
>data with a loose or evolving schema than it is with JSON. For
>example, changing a name-and-address format to allow multiple middle
>names or multiple phone numbers with different roles is more likely
>to be do-able without breaking existing applications in XML than it is in JSON.
>
>This of course is assuming you are processing the XML with
>appropriate languages rather than with a language optimized to handle JSON...
>
>In fact, one could argue that many of the difficulties that arise
>when processing XML arise precisely because of this flexibility
>that's designed into the notation. If you want to make XML
>processing easier in conventional languages, for example with Java
>data binding, the first thing you do is to lock down the schema.
>
>Michael Kay
>Saxonica
>
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
Chair, ISO/IEC JTC1/SC32 and W3C XML Query WG Fax : +1.801.942.3345
Oracle Corporation Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive Alternate email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA Personal email: SheltieJim at xmission dot com
========================================================================
= Facts are facts. But any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]