OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb per

[ Lists Home | Date Index | Thread Index ]

On Fri, 2004-06-11 at 13:15, Elliotte Rusty Harold wrote:
> At 12:42 AM +0200 6/11/04, Henrik Martensson wrote:
> >
> >What I am arguing is that:
> >* it is not likely that anyone can foresee all possible variations
> >   and build software flexible enough to handle them
> >* even in those cases where it is possible, it is often not
> >   cost efficient
> I agree with your first point. It isn't possible or feasible to 
> foresee all possible variations. However I'm not suggesting we do 
> that. I'm suggesting that you deal with new variations as they arise 
> rather than trying to anticipate them.  You don't need to process 
> everything people might send you, just what they actually do send 
> you. The first few weeks with such a system do involve a lot of time 
> writing code to process one new format after another, but matters do 
> stabilize to a manageable level fairly quickly.

No, they do not. I (usually) do not work with systems where content
generation is automated, as you do. In the last XML project I worked
with the customer decided to do the DTD design themselves. They kept a
steady stream of more or less random change up for over 18 months. There
were more than 80 versions of one of the DTDs involved. At that point I
quit the project and left the company I worked for. My friends who
haven't (yet) left tell me they are still keeping it up.

The refusal of both the customer and my former employer to instigate
measures for change control has cost both companies millions, I quit,
the project manager got heart problems (from being forced to work
overtime while seriously ill). The consequences of allowing more or less
random markup changes will affect every maintenance project for the next
fifteen year, maybe more.

I may be mistaken, but I believe you assume that every content author
will be fairly stable (markup-wise, that is), so that changes occur
mainly when a new author enters the arena. This is not the case. The
authors vary a great deal among themselves, of course, but they tend to
have certain traits in common:

* They have little or no training for their job
* They are above average intelligence
* They are bored

As a result, they come up with wildly inventive solutions to problems,
both real and imagined. To keep from being bored, they experiment, and
they just keep on doing it. There is no such thing as stabilization as
time goes on.

I once made a list of markup changes (in a non-XML system) that about
half a dozen authors at one site had taken it upon themselves to make.
It was 140 pages long. There was not one single change in there that
actually solved a problem. Every single change was either a rename of
existing markup, or new markup for something that wasn't supposed to be
marked up in the first place.

> When a new format is discovered after the initial burn-in period, it 
> normally indicates a significant new addition of deletion of 
> information, not just an arbitrary random change; and it's probably 
> something that you want to think about. By preventing communicators 

Se above.

> from sending you new markup you are preventing them from adapting to 
> significant changes in the domain. You are limiting what they are 
> allowed to tell you, and thereby limiting what you can know.

As I have written before, I consider good communications between content
authors and developers to be an absolute necessity. However, there are
better ways to communicate a need than to make a change in the markup
and wait for something to explode.

For one thing, it may be several years from the change to the explosion.
It is not uncommon for data to be stored for decades, until it is
discovered that something is seriously wrong with it. And its not just
information that is stored. One company I worked with published product
catalogs for a foreign market _in_the_wrong_language_ for fifteen years
before anyone discovered it.

Another thing is that when the explosion happens, well, it may be pretty
serious. Some damage can be repaired, some can't be.

Have you noticed that we are going round in circles? You have made
claims about me not listening to authors before, practising bad
programming habits, etc. I have refuted them, and you just keep making
the same claims all over again, without one shred of evidence to back
you up.

You keep ignoring what I write, and attribute opinions to me that I do
not have. Actually, you attribute opinions and practises to me that I am
very outspoken against. You have also claimed that the software I write
is inflexible and inherently brittle. Frankly, you don't have the
slightest idea about how I write software, under what conditions it
works, what makes it break, etc. This is not a good way to make an

Over the course of this thread I have provided several examples of what
can and will happen with the kind of projects I work with when markup
creativity runs amok. You have several opinions that I do not agree
with, like the idea that large groups of human authors would not
instigate a high rate of change in the markup over an extended period of
time. I have provided several real world examples of authors doing just
that, but you haven't come up with a single example supporting your

Earlier, you claimed that unknown information could usually be safely
ignored. Again, I provided a counter example (admonitions).

You also wrote that an agreement between sender and receiver was not
necessary, because software can usually infer what is actually meant.
Again, I have provided examples, showing that it may not be all that
easy, and that contracts must, and do, exist, even though they are
sometimes implicit. I think I also mentioned some of the problems I have
encountered with markup of procedures, and how messing up the tagging
affects downstream processing and reusability in DMS systems.

All my examples have one thing in common: you completely ignored them. I
can't help wondering if it is because you find real world examples being
detrimental to your arguments. I suppose you might ignore them because
you do not believe them. Either way, there is not much point in
continuing the discussion.

I don't doubt that the techniques you use work very well under the
conditions you work with. Also, I believe that you are very good at
using them. On the rare occasions when I work with similar problems
under similar conditions, I do things in ways that are probably not too
different from the way you work. For example, I often work with well
formed XML when chaining a series of transformations. I take care to
write software that is loosely coupled. (I am a strong believer in the
use of design patterns, and the Law of Demeter.) I design DTDs to be as
flexible as possible, never constraining them more than I have to. When
I extract information from a document, I don't build in unnecessary
dependencies on structure. When the same kind of processing applies to
many elements, I prefer distinguishing the elements by class
(properties, context, whatever they have in common) instead of
hardcoding element names. I write automated unit tests, and acceptance
tests when circumstances allow it, etc. Nothing strange about that,
except one thing: you insist that I don't, despite us never having
worked together, and you never having seen any of the code I've written.

I do not think we are getting much further. As I've written before, all
I wanted to do was to refute the idea that the tools and techniques you
use are applicable in general to XML processing. I believe I have done
that through arguments and examples. You do not buy my opinions at all.
I believe that yours are valid for the things you work with, but not
generally applicable.

Let's move on to other topics.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS