Re: Schemas: Different Strokes From Different Folks
• From: ht@markup.co.uk (Henry S. Thompson)
• To: John Cowan <johnwcowan@gmail.com>
• Date: Thu, 19 Jan 2017 17:28:25 +0000

```John Cowan <johnwcowan@gmail.com> writes:

> On Wed, Jan 18, 2017 at 4:17 PM, Henry S. Thompson <ht@markup.co.uk> wrote:
>
> I understood Murata to have proved for RELAX NG that given two XML
>> languages, each (weakly) generated by a RELAX NG schema, the union _of
>> the languages_ would be generated by the union of the schemata.
>>
>
> Indeed, the proof is trivial.  Let a be the root pattern of schema A, and b
> be the root pattern of schema B.  Then <choice><ref name="a"/><ref
> name="b"/></choice> matches any document that matches either a or b or
> both.  And since there are no constraints on the subpatterns of a choice
> pattern other than those inherited from the context of the choice pattern
> (which here is null), all such patterns are valid.

Consider the following two CF-PSGs:

1) S -> X
X -> Y z
Y -> a

2) S -> Y
Y -> b

(1) generates L1={a z}
(2) generates L2=(b}

(1) U (2) generates {a z, b, b z}, which is a proper superset of L1 U L2.

It is straightforward to port this example to RELAX NG:

1) start = X
X = element x { Y , element z { empty } }
Y = element { element a { empty } }

2) start = Y
Y = element b { empty }

with a parallel effect.

So what you've offered above doesn't constitute the desired proof.

The proof that context-free _languages_ are closed under union (see
for example (first one Google offered me) [1]) involves a relabelling
step for non-terminal symbols, precisely to avoid this 'capture'
problem.  A similar step would obviously be possible for RELAX NG.

So the class of RELAX NG _languages_ is closed under union, but it
probably doesn't really make sense to say the _formalism_ is closed
under union.

ht
--
Henry S. Thompson, Markup Systems Ltd.
Cavers Garden Farm, Denholm; by Hawick; TD9 8LN
+44 (0) 7866 471 388
Fax: (44) 131 651-1426, e-mail: ht@markup.co.uk
URL: http://www.markup.co.uk/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
```

