John Cowan <johnwcowan@gmail.com> writes:
On Wed, Jan 18, 2017 at 4:17 PM, Henry S. Thompson <ht@markup.co.uk> wrote:
I understood Murata to have proved for RELAX NG that given two XML
languages, each (weakly) generated by a RELAX NG schema, the union _of
the languages_ would be generated by the union of the schemata.
Indeed, the proof is trivial. Let a be the root pattern of schema A, and b
be the root pattern of schema B. Then <choice><ref name="a"/><ref
name="b"/></choice> matches any document that matches either a or b or
both. And since there are no constraints on the subpatterns of a choice
pattern other than those inherited from the context of the choice pattern
(which here is null), all such patterns are valid.
<academicInterestOnly>
Consider the following two CF-PSGs:
1) S -> X
X -> Y z
Y -> a
2) S -> Y
Y -> b
(1) generates L1={a z}
(2) generates L2=(b}
(1) U (2) generates {a z, b, b z}, which is a proper superset of L1 U L2.
It is straightforward to port this example to RELAX NG:
1) start = X
X = element x { Y , element z { empty } }
Y = element { element a { empty } }
2) start = Y
Y = element b { empty }
with a parallel effect.
So what you've offered above doesn't constitute the desired proof.
The proof that context-free _languages_ are closed under union (see
for example (first one Google offered me) [1]) involves a relabelling
step for non-terminal symbols, precisely to avoid this 'capture'
problem. A similar step would obviously be possible for RELAX NG.
So the class of RELAX NG _languages_ is closed under union, but it
probably doesn't really make sense to say the _formalism_ is closed
under union.
</academicInterestOnly>
ht