XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
SGML's deterinism rules and SHORTREF (was: Re: [xml-dev] RE: Theremarkable similarities between XSLT and Flex/Lex)

Marcus Reichardt <u123724@gmail.com> writes:

>> Am 26.06.2022 um 15:01 schrieb Roger L Costello <costello@mitre.org>:
>> 
>> There was also a strange
>> unnecessary constraint on these expressions called "ambiguity", which
>> *everybody* who wrote SGML software needed to understand, and so the idea of
>> applying formal language techniques to SGML was inevitable.

> Hmm, your anonymous source should've expanded on that; without proof
> or anything the claim of "unnecessary-ness" is void.

> SGML has tag inference (can infer arbitrarily many start- and
> end-element tags) and moreover can expand short reference delimiters
> (arbitrary tokens not used a markup delimiters, including newlines and
> tabs/spaces) by context-dependendant replacement text (start- and
> end-elements typically). Determinism in content models helps a lot to
> make this even work, ...

That's an interesting account; I think I would not be the only reader of
xml-dev who would be interested to see a more concrete argument that the
content-model determinism (confusingly called "ambiguity") rules of SGML
help at all with tag inference or with the management of SHORTREF
mappings.

Can you expound?  I'm sure a lot of us would be glad to learn more.

One reason I ask is that I have been curious for twenty or thirty years
about the rationale for those rules, and none of the members of WG 8
whom I have had the opportunity to ask about it has mentioned anything
to do with markup inference or short references.  Now, it's entirely
possible for some members of a working group to be unaware of technical
arguments important for other WG members, so perhaps I have merely been
unlucky in asking only people who didn't know or care about this
particular piece of technical background.

Another reason is that the essence of the determinism rules is that each
element in the input match against an identifiable identifier in the
content model.  But since SHORTREF mappings are associated with element
types and not with content-model tokens, I don't see a way for it to
matter, for SHORTREF recognition, which token in a content model is
being matched, or whether there is more than one.

> ... and as I hear, the same thing is being
> re-introduced as "Invisible XML", giving you a facility that was
> already there in 1986 ;)

I think it's fair to say that invisible XML and the SHORTREF and DATATAG
(and markup inference) constructs of ISO 8879 have a certain rough
similarity.  But "same thing" sounds to me like an exaggeration.  It has
been a long time since I read 8879, but I do not remember anything like
an explanation of how to take an arbitrary context-free grammar in EBNF
form and formulate an equivalent set of declarations in a DTD.  I don't
remember any discussion of the expressive power of SHORTREF, let alone a
claim that it will handle arbitrary context-free grammars.  

> Of course, with XML-style fully tagged markup, more relaxed models
> could be used, hence RelaxNG.

> What is the argument here? That a Thompson construction for finite
> automata (as opposed to Glushkov/Antimirov derivatives) is more
> convenient for programmers since an off-the-shelve regexp lib can be
> used?

No, I don't think that is the argument Roger Costello was making or
reporting, nor is it one that I have heard anyone make.

One argument I have heard people make, and have made myself, is that the
determinism rules seem ad hoc and unconnected to the basics of automata
theory.  It was not until well after ISO 8879 became an international
standard that there was a coherent statement of what they meant in terms
of automata theory.  The fact that routine techniques from automata
theory could not always be applied (unless they could -- figuring out
which is which is not easy, working from the text of 8879) certainly did
make it harder for people to build conforming SGML processors, and
certainly did affect the number of SGML processors available.

The determinism rules reduce the expressive power of content model
notation vis-a-vis regular expressions, so that although they look like
regular expressions content models are strictly less expressive than
regular expressions.

Any design involves tradeoffs, but for costs like these I would want to
see some substantial advantages.  I first started looking for them 25 or
30 years ago, and in that time I have not yet been persuaded that they
have any compensating advantages at all except that they simplify life
for programmers who can't figure out how to make backtracking work and
cannot be bothered to learn how to determinize a finite state automaton.

The claim that the determinism rules make SHORTREF work would probably
count as a coherent advantage, if it can be substantiated, though it
would not at this date be a very compelling one for those who gave up on
SHORTREF some time in the early 90s.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS