[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Non-deterministic content model
- From: Derek Denny-Brown <derekdb@microsoft.com>
- To: Murali Mani <mani@CS.UCLA.EDU>
- Date: Wed, 13 Jun 2001 12:39:24 -0700
Actually ( (a,b) | (b,a) ) is not ambiguous. ( I assume this is what
you meant by "( a, b | b, a)".. ) I'm not sure about the term
'1-ambiguous' though, so you may be including other stipulations, which
I am not aware of. A content model is just a regular expression. There
exists a proof that regular expressions are isomorphic to NDFAs. There
also exists a proof that NDFAs are isomorphic to DFAs, which is what I
am talking about when I say, every non-deterministic content model can
be (re)written as a deterministic content model. non-deterministic
content models are content models whose direct conversion to a finite
automata, results in a NDFA. A deterministic content model is one whose
direct conversion is a DFA. I say NDFAs and DFAs are isopophic because
any given DFA is a valid NDFA, and a equivalent DFA can be constructed
for any NDFA.
You seem to have a misconcieved understanding of 'ambiguous'.
> Consider another regular expression -- (a*, b*, a*) -- it is
> ambiguous,
> and it has no 1-unambiguous content model (consider any of
> the existing
> operators) -- (actually please double check this example -- i am quite
> sure it is not 1-unambiguous).
( a*, b*, c*) is _not_ ambiguous. At no point it processing a matching
sequence, is the parser unsure what to do next given the current token.
> Consider regular expressions -- the regular expression (a, b
> | b, a) can
> *never* be written as a 1-unambiguous regular expression --
> it can however
> be written as (a & b) -- when it will be 1-unambiguous.
I'm not sure exactly how you meant (a, b | b, a) to be parsed. Either
( ( a, b) | (b, a) )
or ( a, (b | b), a)
The doesn't equate with (a & b), so I will assume you refer to the
first. The first _is_ a 1-unambiguous regular expressions. Consider
processing the 2 valid input streams: "ab" and "ba"
- "ab": 'a' matches first-set in first group of the or group. All that
is left is to match the following 'b' in (a,b).
- "ba": 'b' matches first-set in second group of the or group. All that
is left is to match the following 'a' in (b,a).
No ambiguities.
> she leaves it as an open problem: given a regular language,
> whether there
> is a 1-unambiguoud model group (with the SGML DTD operator &
> in addition
> to the regular expression operators).
Note that she is talking about "1-unambiguous model groups", not
"1-unambiguous regular expressions". There is a very significant
difference. The example you give above of: ( ( a, b) | (b, a) ), can not
be expressed as a 1-unambiguous model group, w/o the '&' operator, but
it can be expressed as a 1-unambiguous regular expression.
My original statement still holds. You can convert an arbitrary
ambigous content model to
an non-ambiguous... but it might involve the introduction of additional
groups... thus it's avoidance of the open problem to which you refered.
Derek Denny-Brown