OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Type-assignment

K.Kawaguchi wrote:

> > > It's not in general easy, unless you restrict the grammar.  For example,
> > > consider the following TREX pattern:
> > 
> > Right.  Kawaguchi-san's another algorithm determines if a RELAX module is 
> > already such a restricted grammar.
> I don't think so.  It is true that a restricted grammar results in
> FASTER algorithm, but it's also true that type-assignment is still
> tractable and in fact quite simple even for unrestricted grammars.

Well, this is a subjective issue.  You modified my algorithm so 
as to record possible interpretations.  You think the algorithm is 
"quite simple", and James thinks it is not easy.  I do not think that 
it is difficult, but also do not think it is very easy.  

1. Ambiguity

If the grammar is guaranteed to be unambiguous, the algorithm does not 
become simpler but the type-assignment is always unique.

Any ambiguous RELAX module can be disambiguated, I guess.  At least, 
any tree regular language can be captured by an unambiguous tree regular 
grammar.  (Hmm, subset construction certainly creates deterministic 
and thus unambiguous hedge automata.  But are there any algorithms which 
do not significantly rewrite RELAX modules?  Interesting but might 
not be really useful.)

There are many ambiguity in computer science.  For example, a regular 
expression a*a* is ambiguous.  So is (a | a).  Perl and other tools 
happily accept such regular expressions.  They also have \1, \2 , etc. 
so as to reference to matches to sub exprssions.  For example, if we 
compare "aab" against \(a*\)b, the value of \1 is "aa".  What happens 
if we compare "aaa" against (\(a*\)*)?  This expression is not 
complicated and different implementations probably behave the same.  
But if we create more complicated ambiguous regular expressions, different 
implementations choose different interpretations (i.e., type assignment) 
and behave differently.  Hmm, do we have to care?  

2. Effortless

If the grammar is guaranteed to be "effortless" (in your terminology), 
the algorithm can be simplified and the type-assignment is always unique.  
I have called this algorithm "semi-deterministic top-down with one lookahead".  
Simply put, you can determine *the* non-terminal for a node when you visit 
the node by using the type and content model of the parent element.

Some RELAX modules cannot be converted to "effortless" RELAX modules.

Given two such modules, its intersection can be always captured by an 
"effortless" module but its union cannot be always captured.  That is, 
this class is not closed under union (and difference).

K.Kawaguchi wrote:

> See my previous post to Norm, which describes it step by step. It works
> with TREX.

I could not find it.  Which message are you talking about?  But we have 
common understanding.