How to extend XML’s syntax

Hi Folks,

Does the XML language provide ways to extend its syntax?

Allow me to explain. Consider Common Lisp. It provides a mechanism to extend the language�s syntax. For example, the language does not provide a while loop but you, the programmer, can extend the language to include a while loop:

(while (test) body)

The parser/compiler understands this new syntax, without any changes to the parser/compiler.

In this example the problem is How can I include a while loop in the Common Lisp language? and the solution is to extend the language with a new syntactic construct.

Contrast that solution with one that simulates a while loop by using an existing loop syntax and then documenting that it has �while loop semantics.� That is solving the problem using a semantic solution.

By enabling the language�s syntax to be extended, Common Lisp enables problems to be solved via syntax rather than semantics. And that is a very good thing:

One of the great themes of computer science over the last

sixty years has been the long-running campaign to move more

and more things out of the "must be checked by eyeball" /

semantics area, and into the "can readily be checked by

machine" / syntax area. [Michael Sperberg-McQueen]

That got me to wondering: Does the XML language provide mechanisms to extend its syntax?

Before discussing that, we must first agree on what is �XML syntax.� XML syntax is the stuff that XML processors understand: a left angle bracket symbol ( < ) denotes the start of a tag, </ denotes the start of an end tag, ='.. .' denotes attribute value, <?xml denotes the start of the element declaration, and so forth. These symbols denote the same thing to every XML processor. They are used by the processor to break up (tokenize) XML documents. They are the XML syntax.

Are there mechanisms for extending the XML syntax?

I can think of only one: user-defined XML entities. E.g.,

<!DOCTYPE Document [
<!ENTITY hello "Hello, world">
]>
<Document>He said, &hello;</Document>

That entity declaration creates a new symbol � hello � that every XML parser recognizes as denoting the string �Hello, world�.

Are there other ways to extend the XML syntax?

If you were King of the World, what mechanisms would you incorporate into a data format to enable its syntax to be extended?

Can you provide a real-world use case that shows how an extended syntax would enable a problem to be solved in a more natural and elegant manner?

/Roger