OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The one element schema language

Rick Jelliffe  wrote:

> And just to be (constructively) bolshie about it, I have come up with a new
> schema language "Hook" based on a new paradigm "partial ordering" which only
> uses one element.  See http://www.ascc.net/xml/hook  for the idea.

Very interesting!  It looks like it meets all of the stated goals.

Here's my attempt at a formal definition of hook-validity:

[ Note: this is based on yesterday's draft -- Rick added some
  features to the language while I wasn't looking :-)
  Also, what follows doesn't address the 'friendly', 'short', or
  'top' features; defining these formally seems straightforward. ]

In the "generalized binary tree" representation of an XML document,
every node has at most two outgoing edges, the FIRST-CHILD and

    DEFINITION: The _hook_ of a node N is the (unique) sequence of nodes
    on the path from the root node to N in the generalized binary tree
    representation of the document.

    DEFINITION: A _hook schema_ is an (ordered) sequence of
    (unordered) sets of NCNames, specified by the following grammar:

	hook-schema	::= items ;
	items		::= item | item items ;
	item		::= NCName | '[' ncnames ']' ;
	ncnames		::= NCName | NCName ncnames ;

As a first attempt to define validity, we'll make a

    TENTATIVE ASSUMPTION 1: a particular NCName may appear in at most
    one item in any hook schema.

Given a hook schema H which satisfies this assumption, we can define a
function '# : NCName -> Integer' which maps an NCName to the number of
the item in H in which the NCName appears, or -1 if the NCName does
not appear in H.  Under this assumption:

    TENTATIVE DEFINITION 1: a document D is _hook-valid_ according to H
    if and only if for every hook 'N1, N2, ... NN' in D, the
    sequence of integers
    	#(local-name(N1)), #(local-name(N2)), ... #(local-name(NN))
    is monotonically nondecreasing.

Or equivalently:

    TENTATIVE DEFINITION 2: a document D is _hook-valid_ according to H
    if and only if for every node N in D,
    	#(local-name(N)) <= #(local-name(FIRST-CHILD(N))
    	#(local-name(N)) <= #(local-name(NEXT-SIBLING(N))

    where we extend the definitions of local-name and # as follows:
    	local-name(NIL) = "#EMPTY"
	#("#EMPTY") = +infinity
    to account for leaves and last siblings.

Under TENTATIVE ASSUMPTION 1, a hook-schema actually induces
a "strict weak order" (in the C++ STL sense) on NCNames.  (A
strict weak order is stronger than a partial order but weaker
than a total order; a relation '<<' on S is a strict weak order
if there is a homomorphism from (S, <<) to (Z, <) where (Z, <)
is a totally ordered set).

However, TENTATIVE ASSUMPTION 1 does not hold in general since
it's contrary to Rick's spec, which explicitly allows NCNames to
appear more than once in a hook schema.  Discarding this assumption,
'#' becomes a relation rather than a function.  Writing 'n # i'
to mean that NCName 'n' appears in the i'th set of items in H,
we have:

    DEFINITION: A document D is _hook-valid_ according to H if
    and only if for all nodes X and Y in D,

    	Y = FIRST-CHILD(X) or Y = NEXT-SIBLING(X) implies
	that there exists i, j such that local-name(X) # i,
	local-name(Y) # j, and i <= j.

This suggests an efficient implementation: since we only need
to compare the _smallest_ i against the _largest_ j, define
imin(H, N) to be the number of the first item in hook schema H
in which NCName N appears, and imax(H, N) as the number of
the last such item.  'imin' and 'imax' can be implemented as hash tables,
and computed in a single pass directly from the hook schema.
Validation can be performed SAX-style:

    variable iprev : integer = -1;
    while not end-of-document():
	case next-event() of
		    if (imin(H,n)) < iprev then error "Invalid"
		    iprev := imax(H,n)
		    iprev := imax(H,n)

(Rick -- does this look like what you had in mind?  I suspect I'm missing
something here, since your example hook schema for XHTML Basic would
actually reject many documents that are DTD-valid with the above
validation algorithm.  It does seem to work for the other examples
you gave (PurchaseOrder, RSS, and Schematron) though.)

Regarding expressive power:

    HYPOTHESIS: Every hook schema is equivalent to a
    tree-local/string-local grammar.

    PROOF: (shouldn't be difficult...)

    HYPOTHESIS: There exist tree-local/string-local languages
    which cannot be expressed as hook schemas.

    PROOF: (the XHTML Basic 'body' element might work as
    a counterexample...)

By way of comparison: TREX and RELAX are equivalent to
tree-regular/string-regular grammars, and XML DTDs are equivalent
to tree-local/string-regular grammars.

I'll have to think about the recently-introduced ';' and '.'
features some more...

--Joe English