OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Non-deterministic content model

Actually ( (a,b) | (b,a) ) is not ambiguous.  ( I assume this is what
you meant by "( a, b | b, a)".. )   I'm not sure about the term
'1-ambiguous' though, so you may be including other stipulations, which
I am not aware of.  A content model is just a regular expression.  There
exists a proof that regular expressions are isomorphic to NDFAs.  There
also exists a proof that NDFAs are isomorphic to DFAs, which is what I
am talking about when I say, every non-deterministic content model can
be (re)written as a deterministic content model.  non-deterministic
content models are content models whose direct conversion to a finite
automata, results in a NDFA.  A deterministic content model is one whose
direct conversion is a DFA.  I say NDFAs and DFAs are isopophic because
any given DFA is a valid NDFA, and a equivalent DFA can be constructed
for any NDFA.

You seem to have a misconcieved understanding of 'ambiguous'.  

> Consider another regular expression -- (a*, b*, a*) -- it is 
> ambiguous,
> and it has no 1-unambiguous content model (consider any of 
> the existing
> operators) -- (actually please double check this example -- i am quite
> sure it is not 1-unambiguous).

( a*, b*, c*) is _not_ ambiguous.  At no point it processing a matching
sequence, is the parser unsure what to do next given the current token.

> Consider regular expressions -- the regular expression (a, b 
> | b, a) can
> *never* be written as a 1-unambiguous regular expression -- 
> it can however
> be written as (a & b) -- when it will be 1-unambiguous.

I'm not sure exactly how you meant (a, b | b, a) to be parsed.  Either
	( ( a, b) | (b, a) )
or	( a, (b | b), a)
The doesn't equate with (a & b), so I will assume you refer to the
first.  The first _is_ a 1-unambiguous regular expressions.  Consider
processing the 2 valid input streams: "ab" and "ba"
- "ab": 'a' matches first-set in first group of the or group.  All that
is left is to match the  following 'b' in (a,b).
- "ba": 'b' matches first-set in second group of the or group.  All that
is left is to match the following 'a' in (b,a).
No ambiguities.

> she leaves it as an open problem: given a regular language, 
> whether there
> is a 1-unambiguoud model group (with the SGML DTD operator & 
> in addition
> to the regular expression operators).

Note that she is talking about "1-unambiguous model groups", not
"1-unambiguous regular expressions".  There is a very significant
difference. The example you give above of: ( ( a, b) | (b, a) ), can not
be expressed as a 1-unambiguous model group, w/o the '&' operator, but
it can be expressed as a 1-unambiguous regular expression.

My original statement still holds.  You can convert an arbitrary
ambigous content model to 
an non-ambiguous...  but it might involve the introduction of additional
groups... thus it's avoidance of the open problem to which you refered.

Derek Denny-Brown