XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] A single, all-encompassing data validation language - good or bad for the marketplace?

On 2007-08-07 00:11:44 -0400 Michael Kay <mike@saxonica.com> wrote:
>> I do not find it difficult to imagine a grammar that can specify 
>> these 
>> constraints; using the set-notation formalisms common in discussions 
>> of 
>> automata, it's relatively straightforward (handling the various 
>> gregorian 
>> exceptions gets increasingly difficult and verbose as one grows more 
>> and 
>> more precise, but it is not inherently beyond the scope of a grammar 
>> ... is 
>> it?).
> 
> I've no idea what the theoretical limits of what you can do with a 
> grammar
> are (I don't even know what the accepted definition of "grammar" is), 
> but I

Err.  Murata-san has a very nice paper that characterizes the sort of 
grammar embodied by several available grammars for XML, including W3C 
XML Schema (which is characterized as a "restrained-competition" 
grammar).

Are you speaking of "grammar" used colloquially?  Because--disclaimer, 
I am a humanities-trained sort just now investigating what amounts to 
Computer Science 101, via texts on formal languages, automata, and 
discrete mathematics--there does seem to be a fairly clear definition. 
  A grammar of a particular type (that is, with particular rules for 
what it can define) defines a particular class of languages, and in 
turn is strongly associated with a class of automata.

DTDs define a grammar for a regular language, and I believe that this 
was the design point of W3C XML Schema as well--a language that could 
be parsed by a deterministic finite automaton (WXS missed that, as a 
consequence of complex interactions, as Murata-san points out in the 
paper on taxonomy of schema languages).  RNG has regular hedge grammar 
as its design point (and hedge automata).

Context free grammars define a class of languages that includes 
regular languages (every regular grammar is a context free grammar), 
but with more power, and are associated with pushdown automata.  It's 
very likely that most W3C XML Schema parsers are implemented in this 
fashion, rather than as deterministic (or equivalently, 
nondeterministic) finite automata.  If that *is* the case, then it 
seems (to me) rather sensible to make use of the power of that style 
of automata more fully, rather than providing a sort of "escape" 
mechanism to break out of the automaton (which effectively puts you 
into turing machine territory, with an implementation problem of 
unknown complexity).

> think I have a feel for the practical limits of when grammar ceases 
> to be
> the most convenient way of stating the rules. I see people sometimes 
> doing
> things with regular expressions that in my view are well beyond that 
> limit.

That's a particular case of grammars.  Regular language == regular 
expressions == regular grammar == deterministic and nondeterministic 
finite automata.  Try and apply regular expressions to paren matching, 
and joy will not be your reward.  You can't express "string over Sigma 
followed immediately by its reversal".  But if W3C XML Schema has 
already passed the border from regular grammar into the land of 
context free grammar, why not investigate the limits of a CFG, before 
introducing yet another, unrelated technology?

In fact, while I was teaching myself this stuff, I borrowed the brain 
of a mathematician friend of mine (hey, hardly been used, you kn-- 
owww!).  She pointed out (and the various texts eventually explained, 
somewhat more verbosely) that you can gain quite a bit of power by 
equipping a finite state machine with a pushdown stack (that is, going 
from regular language to context free, from nfa/dfa to pushdown 
automata), so I naturally immediately suggested having *two* ... but 
was informed that such an automaton is functionally equivalent to a 
turing machine.  Determination of its capabilities is *much* more 
complex, I gather.

But these are ... surely they must be ... issues that the 1.1 W3C XML 
Schema working group has considered, are they not?  With the example 
of RNG, and the formalisms underlying it, and the literature that 
increasingly surrounds the characterization of XML as a regular hedge 
language (consequently a language most comfortably defined by a 
regular hedge grammar, though not *necessarily* practically most 
effectively implemented in a hedge automaton), surely the original 
design point of W3C XML Schema 1.0, of a regular language which could 
be efficiently processed by a finite automaton (in fact, by a top-down 
deterministic finite automaton, which is the justification of UPA if 
I'm not mistaken) was *at least* discussed?  The whole question of 
determinism (the unique particle attribution rule) *ought* to be on 
the WXS 1.1 WG's radar; it's a sore point with users of the technology 
(as is co-occurrence constraints, which is being addressed, albeit in 
a fashion I find rather worthy of remark).

> Why stretch one technology to its limits when you've crossed into a 
> domain
> where other technologies do it better?

Uh ... like moving from topdown deterministic finite automata to 
pushdown automata?

I'm sorry, but grammar defines language and is associated with a class 
of automata; WXS 1.0 had a very clear (if, in my opinion, mistaken) 
design goal in this area.  WXS 1.1 has already introduced an exception 
to the UPA (one implicit in WXS 1.0, as I understand it, but the 
formal acknowledgement is an advance, or potentially is so), so ... 
either the NFA/DFA is equipped with a sort of "exception" machinery of 
indeterminate complexity, or a different class of automata is 
indicated as the design point.

At the risk of repeating myself, I'm just learning stuff that they 
(supposedly?) teach CS undergraduates, so perhaps I've gone off the 
rails.  If so, I'd be happy to hear in what fashion I've done so (and 
my employer would prolly be happy if I get unrealistic ideas out of my 
tiny little head).

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
The flesh is strong.  The spirit stronger.  So shed your skin, baby.
Let it through.  Come on over.
                 -- Amy Ray



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS