Re: [xml-dev] ID/IDREF is evil

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Peter Flynn <peter@silmaril.ie>
To: xml-dev@lists.xml.org
Date: Sun, 16 Feb 2014 20:22:17 +0000
On 02/03/2014 10:05 PM, Costello, Roger L. wrote:
> Hi Folks,
> 
> In this message I will attempt to persuade you:
> 
> 1. Do not use the ID/IDREF capability.

I have yet to see a valid argument as to why I should not use it.

> 2. Use a layering approach: 
> 
> 	(a) Layer 1: express your XML as a context-free grammar.

I need persuading that this is something that should take precedence
over the document representing accurately the author's or editor's
intentions.

> 	(b) Layer 2: express context-sensitive rules using Schematron.
>
> 3. The ID/IDREF capability is a context-sensitive rule.
>
> Now for my argument:
> 
> First, let me persuade you that by using ID/IDREF you have introduced context-sensitive rules into your XML. Consider this XML, which does not use ID/IDREF:
> 
> <Book>
>       <Title>Principles of Programming</Title>
>       <Author>M. A. Jackson</Author>
> </Book>

So far, so good.

> To show XML's rule nature, let's express it like so:
> 
> Book 	--> Title Author
> Title 	--> string
> Author 	--> string

Or, better, avoiding reinvention of the wheel:

<!ELEMENT Book (Title,Author)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>

(Hold your nose, Mike :-)

> That's a context-free grammar. 

Sort of.

> Now let's add an ID/IDREF:

But you don't: you only add an IDREF

> <Book seller="Amazon">
>       <Title>Principles of Programming</Title>
>       <Author>M. A. Jackson</Author>
> </Book>
> 
> Assume that @seller is of type IDREF. 
>
> I don't show the corresponding ID attribute.

You probably should, if you want to maintain your argument.

> Let's express that XML using grammar rules. The rule for the Book
> element depends on the existence of a corresponding ID attribute; if
> there is none, the Book rule is invalid. So we may express Book's
> rule like so:
> 
> Book Amazon --> Title Author

This is wandering off-target. No-one uses an IDREF attribute on the root
element of what is essentially a standalone document in that way (there
are plenty of other reasons, see the TEI Guidelines).

What you seem to be expressing here is an effectivity. There are other
ways of achieving this.

> Read that as:
> 	
> 	In the context of an Amazon symbol 
> 	the Book element may be replaced 
> 	by Title and Author.

I'm afraid I have lost you here: that seems to be a non sequitur.

> In other words, our grammar tells us that this a valid string 
> 	Principles of Programming M. A. Jackson
> only if the symbol "Amazon" exists.
> See the context-sensitivity? Book is context-sensitive due to the ID/IDREF.

I think I can see what you are driving at, but it looks artificial.

> Any time you use ID/IDREF in your XML document you have introduced a
> context-sensitive rule into your XML document.

Any time I add *any* metadata to a document I introduce
context-sensitivity. The art is to do so without breaking reusability of
the document.

> "So what?" you ask.
> 
> Well, here's so what:
> 	
> 	All known parsing algorithms for context-sensitive
> 	grammars are either very inefficient or very complex.
> 
> 	Reasoning about context-sensitive grammars is difficult.
> 
> 	Proofs about context-sensitive grammars is difficult.
> 
> 	Take cue from compiler developers: they separate
> 	context-sensitive processing into a separate pass.
> 
> So don't use ID/IDREF. 

I think we may have different understandings of "context-sensitive".

> Of course, that doesn't mean you will never have data that has
> intra-data dependencies. What it means is that you should modularize
> your grammar rules: express your context-free rules in your XML
> document and express your context-sensitive rules (intra-data
> dependencies) in Schematron.

This sounds like an advertisement (which is no bad thing: Schematron is
excellent for doing this). But I fail to see why you believe ID/IDREF to
be evil. If I have business rules which require one or more objects in a
document to hold a verifiable relationship with some other (single)
object in that document, an ID/IDREF gets me the ride for free.

There are plenty of circumstances where ID/IDREF is inadvisable, but
your example doesn't seem to be one of them, unless I have misundersood
something.

Mike's argument that it requires a DTD is begging the question. There's
nothing inherently wrong or evil about DTDs, which is more than can be
said for other expressions of document structure :-)

On 02/04/2014 01:39 AM, Arjun Ray wrote:
> Declaring ID attributes in an internal subset was quite obviously a 
> non-starter. We should have bit the bullet then and invented some
> new syntax. (I actually thought of several back then, but held my
> peace.)

Sadly, I think we all thought of several, even before the emergence of
the W3C Schema. One of which would have been an extended DTD format.
Fortunately we now have RNG, which provides what DTD 2.0 could have been.

///Peter
Follow-Ups:
- Re: [xml-dev] ID/IDREF is evil
  - From: Michael Kay <mike@saxonica.com>
References:
- ID/IDREF is evil
  - From: "Costello, Roger L." <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]