Lists Home |
Date Index |
From: "Ronald Bourret" <firstname.lastname@example.org>
> In thinking about using the machine readable parts of a RDDL document at
> run time, I think schemas are very useful if they can be used in a
> modular fashion.
I think this mention of "run time" is important, because there is no single
"time of running" for dynamic documents being passed around systems.
Or, better, a run time may be split into many phases, and at each separate
phase some different specific constraints apply.
A schema language will tend to specify as required constraints all the
constraints that are supposed (by foresight and designer's fiat) to be
true at every phase, and to specify as optional constraints
the constraints which may not be true at any phase.
This is a fundamental flaw of schema languages (except for Schematron,
which has a specific phases mechanism). So schema languages will
typically have a workaround mechanism: they will have two public
identifiers--a persistant one to let you know the genus and a specific
one to let you know which schema to use in the particular phase
(this is is of course the SGML Formal Publid Identifier versus
System Identifier split, which we can see in the schemaLocation
attribute of XML Schemas too, for all intents and purposes.)
RDDL's flaw, too, is that it does not provide any built-in mechanism
for supporting phases AFAIK. One can indeed have multiple RDDL
files for the different phases, and name them by putting them at
different loacations. But like XML Schemas, DTDs, Examplotron,
and RELAX NG (corrections to this welcome!), there is no
way to manage the different variants, or even to say that
one is a variant of another.
For publishing, the CATALOG format has been developed to allow
all the different path remappings at a particular phase to be bundled
together, but still there is no idea of phases. For publishing, where
there is often a division of labour in the markup team, the lack of
phases has made specialization more difficult: the table queen cannot
say "just validate the tables, don't give me validation errors
about the metadata--we know we have not completed that yet!"
If we consider schemas as a software engineering technique, as
specialized languages for black-box testing of a pipeline of processes,
then without some phases mechanism, it may be impractical to
validate at each incremental step. We can make up a different
DTD for each step, but then we need to rewrite the DOCTYPE
declaration, and if we change a content model in some way that
is invariant throughout the pipeline, we will need to change each
XML Schemas has more targetted mechanisms for extension,
restriction, and importing that DTD's parameter entities, which
provide a single mechanism that covers a zillion cases, so these
should make life a little easier for deriving individual schemas
for different stages of a pipeline. But still it is clunky because the
constraints are not gathered together and named by their phase.
I think a lot of the discussion about whether namespaces are
enough to process documents misses out on that processes
can augment documents with new infoset items as well as
passively swallow the infoset. Also, that there may be
house rules about which elements are required or optional:
I know of a banking sector case where every institution
uses the same namespace and elements in an application
but each bank requires a different selection of elements:
you have to validate each document against each bank's
schemas to know if it contains the right information items.
When there are augmenting processes,
any schema (or schema umbrella) that does not support phases
can only capture the document as a system of variants and
invariants that hold for the total pipeline or some particular
point or range of the pipeline. Typically this will be the
form deemed suitable for public exchange.
So, back to RDDL, a document type or namespace may need a PUBLIC RDDL,
declaring the end-to-end variants and invariants or the invariants at
a particular point for optimal public interchange, but there also
need to be phase-specific, system-specific RDDLs.
The TAG group, when thinking about namespaces, may find
it useful to be very clear when their statements apply to
end-to-end or public uses of namespaces and system-specific
or phase-dependent uses of namespaces.
In general, let us guess that in about 60% of cases,
a namespace+name is enough to know to process an element.
In 20% of cases we will need to know the parent.
And in 20% of cases, we need to know the value
of an attribute too. Maybe 1% of cases (lists) we need to
know whether it is the first element or not.
The latter two cases are often hidden in procedural code, so people
can easily think (e.g. Tim BL's comments) that only namespace+
name+parent is enough to process an element. (Indeed, this
is re-inforced by XML's Schemas lack of support for attributes).