xml-dev - Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb per

Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb per

[ Lists Home | Date Index | Thread Index ]

To: bry@itnisk.com
Subject: Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
From: Henrik Martensson <henrik.martensson@bostream.nu>
Date: Fri, 11 Jun 2004 00:42:45 +0200
Cc: XML Developer List <xml-dev@lists.xml.org>
In-reply-to: <1086678931.40c567935a569@horde.scannet.dk>
References: <IKEOLCDFPBBPPAHGNKKOKEAAEMAA.howardk@fatdog.com> <p06010208bce75195c8e6@[192.168.254.88]> <40c9aa34.248723415@smtp.bjoern.hoehrmann.de> <p0601020dbce76af8bfc8@[192.168.254.88]> <40C243E6.9060602@alaric-snell.com> <1086511960.3929.23.camel@localhost.localdomain> <p06010206bce8a5118be2@[192.168.254.88]> <40C31EFC.2090203@koberg.com> <p06010202bce8dc0073d2@[192.168.254.88]> <40C33FD3.3050308@comcast.net> <40C35952.1080209@koberg.com> <40C36038.6020900@comcast.net> <1086642052.3611.17.camel@localhost.localdomain> <40C52D2F.7080005@comcast.net> <40C5411D.5050905@zenucom.com> <891ACD72-B90B-11D8-B347-000A95CCC59E@xegesis.org> <1086676833.3719.154.camel@localhost.localdomain> <1086678931.40c567935a569@horde.scannet.dk>

On Tue, 2004-06-08 at 09:15, bry@itnisk.com wrote:
> > 
> > Imagine a document formatting system that just ignores unknown tagging,
> > the way Elliott proposes. Now imagine that an author invents a new
> > admonition tag for a particular market. (The U.S. and Japan have special
> > requirements, so European manuals published in those countries would
> > need some way to distinguish admonitions that must be processed
> > differently than in other countries.) As a result, the market specific
> > part of the document will either be omitted entirely (worst case) 
> 
> I don't get this, this has always seemed to me to be a strength not a weakness.
> The market specific part of the document will either be omitted entirely in
> markets to which it is not relevant right? I think what you're arguing here is

Only if the software processing it knows what to omit, and when. If the
market specific markup is created by an author who has little or no
connection with the software developers, how can the software know how
to process the new markup correctly?

> about interop, that is to say I have extended a standard with market relevant
> information for my application, it works great in my application, if another

No. What Elliot argues is that it should be generally true that if I
extend someones standard, yours, for example, then it is up to your
software to figure out what I meant.

For example, if you, the markup and systems designer, have markup like
this:

<name>Henrik Mårtensson</name>

and I, a technical author, extend that:

<name nationality="Swedish">
  <firstname>Dag</firstname>
  <middlename>Henrik</middlename>
  <lastname>Mårtensson</lastname>
</name>

now it is up to your software to figure out that even though there are
now three names, it is only appropriate to use two of them in most
situations. It should also figure out that since I am Swedish, it is
likely that I do not want to be called by my first name, but by my
middle one. It must also be able to figure out that if I do this:

<name nationality="SE">
  <lastname>Mårtensson</lastname>
  <firstname>Dag</firstname>
  <middlename>Henrik</middlename>
</name>

it is still a Swedish name, and when the name is formatted, the order of
the names should be changed.

What I am arguing is that:
* it is not likely that anyone can foresee all possible variations
  and build software flexible enough to handle them
* even in those cases where it is possible, it is often not
  cost efficient

We are not just discussing automatically generated XML. We are also
discussing XML documents authored by humans. Part of the argument is
about how much unpredictability a human (or large groups of humans) will
introduce if allowed to arbitrarily add new markup.

> application that knows nothing about my applications extensions needs to deal
> with the market relevant information then there can be a problem but I'm not
> sure for whom the problem actually is, is it for the people using my

Part of the problem is that there will be hundreds, maybe thousands, of
people inventing new markup for every person who actually writes code to
process that markup. Most corporations have many more authors than XML
DMS developers.

Also, the information receivers tend to be those who has to bear the
costs of the markup extensions. (Extending markup can be used as a
powerful weapon, as proven by Microsoft and others.)

Elliott argues that it is fairly easy to handle such situations at the
developer end. I argue that it is very difficult.

> applications data? If it is data brought into my application from another source
> and then extended I would say no, go get the damn data from the original source
> not second hand. If it is data generated by my application then the question is,
> if you're gonna rely on a particular application why do you not find out if it
> extends things in any way. 

The "application" is usually a human, in my part of the XML universe. If
allowed to, they will make their own extensions.

In one case I was involved in, an author wrote his own DTD with more
than 200 elements, and refused to use anything else. Of course, it
wasn't possible to support his private DTD in all processing
applications in the company, or even build a filter system just for him,
so everything he wrote was essentially useless. His position was that
everyone else (60,000+ people) ought to start using his DTD.

> 
> Anyway I could go on with all sorts of ways that I think the problems are
> unclear, the thing is that this way of handling document extension has proven to
> be particularly useful for xml applications. I see papers and tutorials
> published all the time about it, one particular popular motif being 'extend x
> with rdf'!!! The argument is being made that this method of extension is so
> efficient in regards to other methods that it should be defaulted to. Your
> argument against it doesn't seem clear enough to me to drop a powerful method. 

I am not arguing that anyone should. Different methods are suitable in
different circumstances.

RDF isn't necessarily much use when writing a user manual. This does not
imply that RDF is useless, or that it is useless when publishing the
manual, or even when searching for information to include in the manual.
Far from it.

> 
> >or
> > formatted the wrong way (best case), probably just like any other block
> > of text. Either way, if an accident happens, the company that publishes
> > the manual would be liable to pay damages.
> >
> see when you're talking about doing something like publishing a manual then I
> think you're arguing about the organisation that has the data has extended the
> data with market relevant information and then they're too incompetent to change
> their publishing methods. If you can't change the handling of the data in your
> own applications then don't extend the data, that's my theory. 

Not quite. First of all, the thing I am against is arbitrary extensions
of complex documents by individual authors.

When an organisation extends, or changes, an XML schema, it is a bit
different. (Well, not always...):
* An organisation has access to XML expertise. An author usually has
  very little or no training in XML development. (Most have little or
  no training in structured authoring, or any other kind of authoring.
  Writing well requires a lot of expertise, but companies skimp on
  this. In many cases it would be more efficient to train the
  writers than to build custom software, but no one wants to pay
  for training humans.)
* An organisation usually organises a project to determine what
  changes are necessary, and how to implement them as effectively
  as possible, with minimal disturbance to all processing systems.
  It also tries to identify those systems that will have to change,
  updates the systems, tests the new schemas, and times everything
  so that a changeover has minimal impact.
  An individual author usually just implements something, without
  knowing or caring about the overall impact on the systems
  involved.

Then of course, a key idea with XML is that if you mark up the
information well, with descriptive tagging, then you can change the way
the data is processed without changing the tagging. In most cases this
is true.

My experience is that well over two thirds of all markup change requests
in corporate projects are unnecessary. The desired functionality can be
implemented much more cheaply and efficiently without changing the
markup. In many cases, the functionality already exists, it is just that
the users get no training, so they do not know about it.

> 
>  
> > With XML, all hell still breaks loose when the format is changed. XML is
> > no different from other formats in this respect.
> > 
> well sometimes when I see all hell break loose when the format is changed I find
> that is because the system was built without a clear idea of how format changes
> should be approached in a particular xml dialect, sometimes this is because the
> dialect itself has not clear idea how format changes should be approached. Most
> often it is because the developers have no idea that there is such a thing as
> different models for how to handle changes, one of which, the most common is the
> one under discussion, that model presupposes that unknown markup is ignored but
> subtrees of known markup are not ignored, that means that any extension to the
> system has to take that model into consideration. I consider that as something
> that should be blamed on the developer, I can see how we might argue that this
> is something that developers should not be blamed for because it is just too
> much to expect them to be able to take into consideration in their hectic
> work-schedule but given that it is something that I have learned to take into
> consideration and pay attention to when I build applications I don't feel like
> giving anyone else a pass on it (especially as I consider it to be something
> that makes application building easier with xml data). 

When dealing with technical documents, there are basically two kinds of
new markup that may appear in a document:

* Useless crud
* Markup that is present for a good reason

When there is a reason for having the new markup, it is always the same:
the new markup is there to enable the processing systems to change their
behavior in some way they could not otherwise do. (Or in some manner
that would be very difficult without the markup.) This means that
important markup must not be ignored by processing systems. It is meant
to have an impact on their behavior. Consequently, a processing system
that ignores the markup (and possibly the content), will do something
bad. (Or fail to do something good.) Bad ranges from trivial
inconveniences to fortunes lost, to people killed, depending on the
circumstances. Great monetary loss or people killed have been risks in
every XML project I have ever worked in so from my perspective, this is
normal. (Well, no, not my own hobby projects, though one or two nearly
killed me...)

Useless crud could be safely ignored, of course, but everyone that gets
their chance to leave a mark in the world through their very own XML tag
sincerely believes that it is the one that will be the difference
between making it or breaking it for their company, so how will a piece
of software be able to differentiate? It can't. That is why it must
treat so many anomalies as something that requires human attention.

As for blaming downstream developers for mistakes made by content
authors upstream, I disagree. You are certainly right when you write
that developers should make their software as flexible and robust as
possible, but then again, many developers lack the training to do that. 

In my experience, only about one developer in five knows about more than
the basics of object oriented programming, design patterns, TDD,
refactoring, versioning systems, or any other techniques and tools that
you and I may take for granted.

You can't blame individual developers for this. They are adapted to the
requirements of large corporations and consultancy firms, i.e. they are
cheap. If they were more skilled, they would be more expensive, and
would get fired. (They would also be effective enough to more than
compensate for their higher wages, but I have never seen a customer
factor that in for a "generic" programmer, only for a few experts.)

To make matters worse, having one good programmer on a team isn't
enough. Competent project managers are even more rare. To produce good
quality software, everyone must be well over average, and the team
manager must be competent. This combination does not occur often. (In
that particular market segment, that is. There are of course companies
that are chock full of competent developers. Somehow, they seem to be
very rare in the XML documentation business. Salespeople with good golf
handicaps are common though.)

Please note that I am not trying to proscribe to anyone else how to deal
with their problems. I believe that different problems require different
solutions. I don't believe that Elliot's techniques are bad. On the
contrary, I believe he is very good at what he does, and uses techniques
appropriate to the task. What I do not buy is that those techniques and
strategies would necessarily be appropriate to the very different set of
problems that I am dealing with. I do not put an overabundance of faith
in golden hammers (including XML itself).

/Henrik

Follow-Ups:
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: John Cowan <cowan@ccil.org>

References:
- RE: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: "Howard Katz" <howardk@fatdog.com>
- RE: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Henrik Martensson <henrik.martensson@bostream.nu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Robert Koberg <rob@koberg.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: "Thomas B. Passin" <tpassin@comcast.net>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Robert Koberg <rob@koberg.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: "Thomas B. Passin" <tpassin@comcast.net>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Henrik Martensson <henrik.martensson@bostream.nu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: "Thomas B. Passin" <tpassin@comcast.net>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Rick Marshall <rjm@zenucom.com>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: Michael Champion <mc@xegesis.org>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
  - From: Henrik Martensson <henrik.martensson@bostream.nu>
- Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
  - From: bry@itnisk.com

Prev by Date: Re: [xml-dev] Meta-somethingorother (was the semantic web mega-permathread thing)
Next by Date: Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] SemanticWeb permathread, iteration n+1
Previous by thread: Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
Next by thread: Re: [xml-dev] The triples datamodel -- was Re: [xml-dev] Semantic Web permathread, iteration n+1
Index(es):
- Date
- Thread