[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] A single, all-encompassing data validation language - good or bad for the marketplace?
- From: Amelia A Lewis <amyzing@talsever.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 07 Aug 2007 20:01:01 -0400
On 2007-08-07 00:11:44 -0400 Michael Kay <mike@saxonica.com> wrote:
>> I do not find it difficult to imagine a grammar that can specify
>> these
>> constraints; using the set-notation formalisms common in discussions
>> of
>> automata, it's relatively straightforward (handling the various
>> gregorian
>> exceptions gets increasingly difficult and verbose as one grows more
>> and
>> more precise, but it is not inherently beyond the scope of a grammar
>> ... is
>> it?).
>
> I've no idea what the theoretical limits of what you can do with a
> grammar
> are (I don't even know what the accepted definition of "grammar" is),
> but I
Err. Murata-san has a very nice paper that characterizes the sort of
grammar embodied by several available grammars for XML, including W3C
XML Schema (which is characterized as a "restrained-competition"
grammar).
Are you speaking of "grammar" used colloquially? Because--disclaimer,
I am a humanities-trained sort just now investigating what amounts to
Computer Science 101, via texts on formal languages, automata, and
discrete mathematics--there does seem to be a fairly clear definition.
A grammar of a particular type (that is, with particular rules for
what it can define) defines a particular class of languages, and in
turn is strongly associated with a class of automata.
DTDs define a grammar for a regular language, and I believe that this
was the design point of W3C XML Schema as well--a language that could
be parsed by a deterministic finite automaton (WXS missed that, as a
consequence of complex interactions, as Murata-san points out in the
paper on taxonomy of schema languages). RNG has regular hedge grammar
as its design point (and hedge automata).
Context free grammars define a class of languages that includes
regular languages (every regular grammar is a context free grammar),
but with more power, and are associated with pushdown automata. It's
very likely that most W3C XML Schema parsers are implemented in this
fashion, rather than as deterministic (or equivalently,
nondeterministic) finite automata. If that *is* the case, then it
seems (to me) rather sensible to make use of the power of that style
of automata more fully, rather than providing a sort of "escape"
mechanism to break out of the automaton (which effectively puts you
into turing machine territory, with an implementation problem of
unknown complexity).
> think I have a feel for the practical limits of when grammar ceases
> to be
> the most convenient way of stating the rules. I see people sometimes
> doing
> things with regular expressions that in my view are well beyond that
> limit.
That's a particular case of grammars. Regular language == regular
expressions == regular grammar == deterministic and nondeterministic
finite automata. Try and apply regular expressions to paren matching,
and joy will not be your reward. You can't express "string over Sigma
followed immediately by its reversal". But if W3C XML Schema has
already passed the border from regular grammar into the land of
context free grammar, why not investigate the limits of a CFG, before
introducing yet another, unrelated technology?
In fact, while I was teaching myself this stuff, I borrowed the brain
of a mathematician friend of mine (hey, hardly been used, you kn--
owww!). She pointed out (and the various texts eventually explained,
somewhat more verbosely) that you can gain quite a bit of power by
equipping a finite state machine with a pushdown stack (that is, going
from regular language to context free, from nfa/dfa to pushdown
automata), so I naturally immediately suggested having *two* ... but
was informed that such an automaton is functionally equivalent to a
turing machine. Determination of its capabilities is *much* more
complex, I gather.
But these are ... surely they must be ... issues that the 1.1 W3C XML
Schema working group has considered, are they not? With the example
of RNG, and the formalisms underlying it, and the literature that
increasingly surrounds the characterization of XML as a regular hedge
language (consequently a language most comfortably defined by a
regular hedge grammar, though not *necessarily* practically most
effectively implemented in a hedge automaton), surely the original
design point of W3C XML Schema 1.0, of a regular language which could
be efficiently processed by a finite automaton (in fact, by a top-down
deterministic finite automaton, which is the justification of UPA if
I'm not mistaken) was *at least* discussed? The whole question of
determinism (the unique particle attribution rule) *ought* to be on
the WXS 1.1 WG's radar; it's a sore point with users of the technology
(as is co-occurrence constraints, which is being addressed, albeit in
a fashion I find rather worthy of remark).
> Why stretch one technology to its limits when you've crossed into a
> domain
> where other technologies do it better?
Uh ... like moving from topdown deterministic finite automata to
pushdown automata?
I'm sorry, but grammar defines language and is associated with a class
of automata; WXS 1.0 had a very clear (if, in my opinion, mistaken)
design goal in this area. WXS 1.1 has already introduced an exception
to the UPA (one implicit in WXS 1.0, as I understand it, but the
formal acknowledgement is an advance, or potentially is so), so ...
either the NFA/DFA is equipped with a sort of "exception" machinery of
indeterminate complexity, or a different class of automata is
indicated as the design point.
At the risk of repeating myself, I'm just learning stuff that they
(supposedly?) teach CS undergraduates, so perhaps I've gone off the
rails. If so, I'd be happy to hear in what fashion I've done so (and
my employer would prolly be happy if I get unrealistic ideas out of my
tiny little head).
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
The flesh is strong. The spirit stronger. So shed your skin, baby.
Let it through. Come on over.
-- Amy Ray
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]