OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML-appropriate editing data structures

[ Lists Home | Date Index | Thread Index ]

On Friday 09 April 2004 17:22, Elliotte Rusty Harold wrote:
> At 4:36 PM +0200 4/9/04, Henrik Martensson wrote:
> >Allowing authors to add tags not supported by the DTD (and therefore not
> >by the processing systems that are designed to support what the DTD
> >supports, nothing less, nothing more) renders the use of XML
> >meaningless.
> A common fallacy, and one I'm not surprised to hear from you given
> your avowed SGML background.

Fallacy? This depends on what the system's supposed to do, and what the 
writer's allowed to do in that context. One of the main points of having a 
DTD in the first place is to provide a set of structural rules to be 
followed. Believe it or not, there is actually a point in restricting what 
the author's allowed to do.

Disagree? Fine, give 'em Word instead.

> The fundamental difference between XML 
> and SGML is not the restricted syntax, and the one that means XML is
> not just a subset of SGML, is that XML separates the notion of
> well-formedness from validity. In XML, well-formedness is required.
> Validity isn't, and normally shouldn't be.

This depends on what you do with XML. I always considered the decision to 
introduce the well-formed, schema-less document in XML to be dangerous, as 
way too many people interpreted this as "validity is rarely necessary".

Sometimes it is. Sometimes it isn't. When you're dealing with XML in authoring 
systems, it usually is, whether the system is based on SGML or XML.

> A processing system that 
> can only handle what the DTD mandates is flawed.

If there's a DOCTYPE, and that DOCTYPE points out a DTD, and that DTD isn't 
followed, and the system disallows this, the processing system most certainly 
isn't flawed. It's doing its job.

> Instead a processing 
> system needs to ask if the document provides what it needs to do its
> job, not if it adheres to some arbitrary schema. And if the document
> doesn't provide what it needs, then the system should notice and
> handle the case gracefully, perhaps alerting a human, perhaps not,
> depending on circumstances.

Personally, I'm through with trying to handle tag abuse gracefully. Validation 
errors of the kind where a resourceful author attempts to introduce new and 
exciting markup in a service instruction that is essential to the correct 
servicing of a product that some poor customer paid tens of thousands of 
dollars for should result in 220 Volts through the keyboard.

Of course, in a way, that _is_ alerting a human...

> The DocBook XSL stylesheets, for example, still work quite well with
> invalid markup. Indeed, XPath and XSLT 1.0 based systems generally do
> work quite well with documents regardless of validity because those
> languages were designed without the assumption of validity.

but this is very different from having the authoring system accept markup not 
allowed by the DTD. Many XSL stylesheets handle invalid markup well enough; 
it's partly in the nature of XSL. but that's not the real point here.

> Sadly, 
> the same cannot be said for many other systems that were needlessly
> tightly coupled to particular formats. Built here it is the systems
> that are broken. The X stands for extensible. When objects are
> encountered that are not represented in the current DTD/schema it is
> the right and duty of the author to invent new markup to describe
> what is actually there.

I take it that you're not in the technical writing business?!

This is a very dangerous assumption to make. If the writers are free to invent 
new markup whenever they feel that the current (allowed) markup doesn't cover 
what they are describing you're soon facing a situation where different 
markup is used to describe semantically identical situations. This costs 
money. Lots of money. Yes, it's possible to create a system that will 
"gracefully" alert something or someone of the new markup, and it's possible 
to even treat it reasonably well (in other words, publish the thing without 
breaking anything too badly) but it will cost money.

If the writer feels that the available markup is insufficient, he/she should 
alert the editor ASAP. In all likelihood, it's a question of 
misinterpretation of the DTD in which case the editor should be able to point 
the writer in the right direction. In the worst case, there might be a need 
for an updated DTD and, sometimes, updated processing software, but this is 
relatively rare.

> Systems that cannot deal sensibly with the 
> extensibility of XML are broken.

Allowing arbitrary new elements is not extensibility. (And it's not very 
sensible either, IMHO.)




News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS