[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: The one element schema language
- From: Joe English <jenglish@flightlab.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 06 Feb 2001 09:00:53 -0800
Rick Jelliffe wrote:
> And just to be (constructively) bolshie about it, I have come up with a new
> schema language "Hook" based on a new paradigm "partial ordering" which only
> uses one element. See http://www.ascc.net/xml/hook for the idea.
Very interesting! It looks like it meets all of the stated goals.
Here's my attempt at a formal definition of hook-validity:
[ Note: this is based on yesterday's draft -- Rick added some
features to the language while I wasn't looking :-)
Also, what follows doesn't address the 'friendly', 'short', or
'top' features; defining these formally seems straightforward. ]
In the "generalized binary tree" representation of an XML document,
every node has at most two outgoing edges, the FIRST-CHILD and
NEXT-SIBLING.
DEFINITION: The _hook_ of a node N is the (unique) sequence of nodes
on the path from the root node to N in the generalized binary tree
representation of the document.
DEFINITION: A _hook schema_ is an (ordered) sequence of
(unordered) sets of NCNames, specified by the following grammar:
hook-schema ::= items ;
items ::= item | item items ;
item ::= NCName | '[' ncnames ']' ;
ncnames ::= NCName | NCName ncnames ;
As a first attempt to define validity, we'll make a
TENTATIVE ASSUMPTION 1: a particular NCName may appear in at most
one item in any hook schema.
Given a hook schema H which satisfies this assumption, we can define a
function '# : NCName -> Integer' which maps an NCName to the number of
the item in H in which the NCName appears, or -1 if the NCName does
not appear in H. Under this assumption:
TENTATIVE DEFINITION 1: a document D is _hook-valid_ according to H
if and only if for every hook 'N1, N2, ... NN' in D, the
sequence of integers
#(local-name(N1)), #(local-name(N2)), ... #(local-name(NN))
is monotonically nondecreasing.
Or equivalently:
TENTATIVE DEFINITION 2: a document D is _hook-valid_ according to H
if and only if for every node N in D,
#(local-name(N)) <= #(local-name(FIRST-CHILD(N))
and
#(local-name(N)) <= #(local-name(NEXT-SIBLING(N))
where we extend the definitions of local-name and # as follows:
local-name(NIL) = "#EMPTY"
#("#EMPTY") = +infinity
to account for leaves and last siblings.
Under TENTATIVE ASSUMPTION 1, a hook-schema actually induces
a "strict weak order" (in the C++ STL sense) on NCNames. (A
strict weak order is stronger than a partial order but weaker
than a total order; a relation '<<' on S is a strict weak order
if there is a homomorphism from (S, <<) to (Z, <) where (Z, <)
is a totally ordered set).
However, TENTATIVE ASSUMPTION 1 does not hold in general since
it's contrary to Rick's spec, which explicitly allows NCNames to
appear more than once in a hook schema. Discarding this assumption,
'#' becomes a relation rather than a function. Writing 'n # i'
to mean that NCName 'n' appears in the i'th set of items in H,
we have:
DEFINITION: A document D is _hook-valid_ according to H if
and only if for all nodes X and Y in D,
Y = FIRST-CHILD(X) or Y = NEXT-SIBLING(X) implies
that there exists i, j such that local-name(X) # i,
local-name(Y) # j, and i <= j.
This suggests an efficient implementation: since we only need
to compare the _smallest_ i against the _largest_ j, define
imin(H, N) to be the number of the first item in hook schema H
in which NCName N appears, and imax(H, N) as the number of
the last such item. 'imin' and 'imax' can be implemented as hash tables,
and computed in a single pass directly from the hook schema.
Validation can be performed SAX-style:
variable iprev : integer = -1;
while not end-of-document():
case next-event() of
START-TAG(n):
if (imin(H,n)) < iprev then error "Invalid"
iprev := imax(H,n)
END-TAG(n):
iprev := imax(H,n)
(Rick -- does this look like what you had in mind? I suspect I'm missing
something here, since your example hook schema for XHTML Basic would
actually reject many documents that are DTD-valid with the above
validation algorithm. It does seem to work for the other examples
you gave (PurchaseOrder, RSS, and Schematron) though.)
Regarding expressive power:
HYPOTHESIS: Every hook schema is equivalent to a
tree-local/string-local grammar.
PROOF: (shouldn't be difficult...)
HYPOTHESIS: There exist tree-local/string-local languages
which cannot be expressed as hook schemas.
PROOF: (the XHTML Basic 'body' element might work as
a counterexample...)
By way of comparison: TREX and RELAX are equivalent to
tree-regular/string-regular grammars, and XML DTDs are equivalent
to tree-local/string-regular grammars.
I'll have to think about the recently-introduced ';' and '.'
features some more...
--Joe English
jenglish@flightlab.com