ID/IDREF is evil

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

ID/IDREF is evil

From: "Costello, Roger L." <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Mon, 3 Feb 2014 22:05:47 +0000

Hi Folks,

In this message I will attempt to persuade you:

1. Do not use the ID/IDREF capability.

2. Use a layering approach:

(a) Layer 1: express your XML as a context-free grammar.

(b) Layer 2: express context-sensitive rules using Schematron.

3. The ID/IDREF capability is a context-sensitive rule.

Now for my argument:

First, let me persuade you that by using ID/IDREF you have introduced context-sensitive rules into your XML. Consider this XML, which does not use ID/IDREF:

<Book>
<Title>Principles of Programming</Title>
<Author>M. A. Jackson</Author>
</Book>

To show XML's rule nature, let's express it like so:

Book --> Title Author
Title --> string
Author --> string

That's a context-free grammar.

Now let's add an ID/IDREF:

<Book seller="Amazon">
<Title>Principles of Programming</Title>
<Author>M. A. Jackson</Author>
</Book>

Assume that @seller is of type IDREF. I don't show the corresponding ID attribute.

Let's express that XML using grammar rules. The rule for the Book element depends on the existence of a corresponding ID attribute; if there is none, the Book rule is invalid. So we may express Book's rule like so:

Book Amazon --> Title Author

Read that as:

In the context of an Amazon symbol
the Book element may be replaced
by Title and Author.

In other words, our grammar tells us that this a valid string

Principles of Programming M. A. Jackson

only if the symbol "Amazon" exists.

See the context-sensitivity? Book is context-sensitive due to the ID/IDREF.

Any time you use ID/IDREF in your XML document you have introduced a context-sensitive rule into your XML document.

"So what?" you ask.

Well, here's so what:

All known parsing algorithms for context-sensitive
grammars are either very inefficient or very complex.

Reasoning about context-sensitive grammars is difficult.

Proofs about context-sensitive grammars is difficult.

Take cue from compiler developers: they separate
context-sensitive processing into a separate pass.

So don't use ID/IDREF.

Of course, that doesn't mean you will never have data that has intra-data dependencies. What it means is that you should modularize your grammar rules: express your context-free rules in your XML document and express your context-sensitive rules (intra-data dependencies) in Schematron. That's a nice, clean separation-of-concerns. That's a modular data design.

Let's recap:

1. ID/IDREF introduces context-sensitive rules into your XML grammar wherever there is an ID attribute and wherever there is an IDREF attribute.

2. Don't use ID/IDREF.

3. Modularize your rules: express context-free rules in XML and express context-sensitive rules in Schematron.

Comments?

/Roger

Follow-Ups:
- Re: [xml-dev] ID/IDREF is evil
  - From: Peter Flynn <peter@silmaril.ie>
- Re: [xml-dev] ID/IDREF is evil
  - From: =?UTF-8?B?UGlvdHIgQmHFhHNraQ==?= <bansp@o2.pl>
- RE: ID/IDREF is evil
  - From: "Cox, Bruce" <Bruce.Cox@USPTO.GOV>
- Re: [xml-dev] ID/IDREF is evil
  - From: Steve Newcomb <srn@coolheads.com>
- Re: [xml-dev] ID/IDREF is evil
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] ID/IDREF is evil
  - From: Michael Sokolov <msokolov@safaribooksonline.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]