[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Type-assignment
- From: Murata Makoto <mura034@attglobal.net>
- To: "K.Kawaguchi" <k-kawa@bigfoot.com>
- Date: Fri, 02 Feb 2001 13:36:29 +0900
K.Kawaguchi wrote:
> > > It's not in general easy, unless you restrict the grammar. For example,
> > > consider the following TREX pattern:
> >
> > Right. Kawaguchi-san's another algorithm determines if a RELAX module is
> > already such a restricted grammar.
>
> I don't think so. It is true that a restricted grammar results in
> FASTER algorithm, but it's also true that type-assignment is still
> tractable and in fact quite simple even for unrestricted grammars.
Well, this is a subjective issue. You modified my algorithm so
as to record possible interpretations. You think the algorithm is
"quite simple", and James thinks it is not easy. I do not think that
it is difficult, but also do not think it is very easy.
1. Ambiguity
If the grammar is guaranteed to be unambiguous, the algorithm does not
become simpler but the type-assignment is always unique.
Any ambiguous RELAX module can be disambiguated, I guess. At least,
any tree regular language can be captured by an unambiguous tree regular
grammar. (Hmm, subset construction certainly creates deterministic
and thus unambiguous hedge automata. But are there any algorithms which
do not significantly rewrite RELAX modules? Interesting but might
not be really useful.)
There are many ambiguity in computer science. For example, a regular
expression a*a* is ambiguous. So is (a | a). Perl and other tools
happily accept such regular expressions. They also have \1, \2 , etc.
so as to reference to matches to sub exprssions. For example, if we
compare "aab" against \(a*\)b, the value of \1 is "aa". What happens
if we compare "aaa" against (\(a*\)*)? This expression is not
complicated and different implementations probably behave the same.
But if we create more complicated ambiguous regular expressions, different
implementations choose different interpretations (i.e., type assignment)
and behave differently. Hmm, do we have to care?
2. Effortless
If the grammar is guaranteed to be "effortless" (in your terminology),
the algorithm can be simplified and the type-assignment is always unique.
I have called this algorithm "semi-deterministic top-down with one lookahead".
Simply put, you can determine *the* non-terminal for a node when you visit
the node by using the type and content model of the parent element.
Some RELAX modules cannot be converted to "effortless" RELAX modules.
Given two such modules, its intersection can be always captured by an
"effortless" module but its union cannot be always captured. That is,
this class is not closed under union (and difference).
K.Kawaguchi wrote:
> See my previous post to Norm, which describes it step by step. It works
> with TREX.
I could not find it. Which message are you talking about? But we have
common understanding.
Cheers,