OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Report on the elimination of SGML AND groups (fwd)

[ Lists Home | Date Index | Thread Index ]
  • From: Jani Jaakkola <jjaakkol@cs.Helsinki.FI>
  • To: xml-dev@ic.ac.uk
  • Date: Tue, 19 May 1998 17:47:11 +0300 (EEST)

I'm sure that there are people in this list, who might find this
report interesting. However, be prepared for some theoretically
oriented computer science. You have been warned.

---------- Forwarded message ----------
Date: Tue, 19 May 1998 12:44:54 +0300 (EET DST)
From: Pekka Kilpelainen <kilpelai@cs.helsinki.fi>
To: Anne.Brueggemann-Klein@informatik.tu-muenchen.de, dwood@cs.ust.hk,
    cmsmcq@tigger.cc.uic.edu, marcy@world.std.com,
Cc: Helena.Ahonen@cs.helsinki.fi, Barbara.Heikkinen@cs.helsinki.fi,
    Oskari.Heinonen@cs.helsinki.fi, Jani.Jaakkola@cs.helsinki.fi,
    Pekka.Kilpelainen@cs.helsinki.fi, Greger.Linden@cs.helsinki.fi,
    Jyrki.Niemi@cs.helsinki.fi, Kimmo.Paasiala@cs.helsinki.fi
Subject: Report on the elimination of SGML AND groups

Dear colleagues, FYI: I have written a report, where I analyze the 
possibilities and the lengtehening effect of replacing AND groups of 
SGML content models by equivalent XML model groups. 
The report is available as gnu-zipped Postscript through its 
abstract page at the address 

	http://www.cs.helsinki.fi/~kilpelai/C-1998-12.html .

I include at the bottom the abstract of the report.

I would be thankful for any comments. 

Yours, Pekka Kilpelainen

Pekka Kilpelainen, University of Helsinki, Dept. of Computer Science
      Email: Pekka.Kilpelainen@cs.helsinki.fi 
      phone: +358 9 7084 4227, fax: +358 9 7084 4441

SGML & XML Content Models

Pekka Kilpeläinen 

University of Helsinki, Department of Computer Science
Report C-1998-12, May 1998
16 pages

The SGML and XML standards use a variation of  regular expressions
called content models for modeling the markup structures of document
elements. SGML content models may include so called AND groups, which
are excluded from XML. An AND group, which is a sequence of
subexpressions separated by an &-operator, denotes the sequential
catenation of its subexpressions in any possible order. If one wants to
shift from SGML to XML in document production, one has to translate SGML
content models to corresponding XML content models.

The allowed content models in both SGML and XML are restricted by a
requirement of determinism, which means that a parser recognizing
document element contents has to be able to decide without lookahead,
which content model token to match with the current input token, while
processing the document from left to right.  It is known that not all
SGML content models can be expressed as an equivalent XML content model.
It is also known that transforming an SGML content model into an
equivalent XML content model may cause an exponential growth in the
length  of the content model.  We discuss methods of eliminating AND
groups and analyze the circumstances where they can be applied. We
derive a tight bound of $e n!$ on the number of symbols in the result of
eliminating an AND group of $n$ symbols, where $e = 2.71828...$is the
base of natural logarithms.  We present the analysis in a pedagogical
manner, emphasizing mathematical methods which are typical to the
analysis of algorithms.  We also show that minimal deterministic
automata for recognizing an AND group of $n$ distinct element names
contain $2^{n}$ states and $n 2^{n-1}$ transitions, excluding the
failure state and transitions leading to it.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS