[
Lists Home |
Date Index |
Thread Index
]
> peter murray-rust wrote:
>
> I am struggling with how to continue to formalize the semantics of
> Chemical Markup Language (CML).
I would be happy if the usage of the word semantics is restrained in the
XML community even if the term formal semantics is well stablished in
computer science theory. Content sounds more accurate for me.
> * compound documents (e.g. scientific publications) composed of a
> range of markup languages (XHTML, SVG, MathML, CML, etc.). Many
> publishers are now actively starting to adopt this approach.
Interesting, some reference?
> A key
> approach is that data and text are mixed ("datument") so that we can
> transmit data in primary publications. Machines can now start to
> understand scientific publications.
I would say "to analize".
> This is sufficiently broad that it is impossible to create a
> traditional XSD schema which allows for all uses.
Such as I see the problems are being complexity and flexibility. I think
that the whole XML approach was not really designed for dealing with
complex applications in a flexible way.
> Moreover there are no user communities who
> require all CML functionality at once and so we assume that
> particular groups will use subsets of the language.
Yes, a modular (molecular) approach looks better.
> XSD syntax (basically the stuff I can understand), limited to:
> * definition of elements containing explicit complexType and
> references to element children
> * definitions of types
> * definition of attributes
> The specification is used for the following:
> * validation of documents
> * (complete semantic) documentation of the language (IOW the
> specification should be a machine-understandable description of the
> language.) It is inspired by the ideas of literate programming and
> will use <appino> etc. This is not complete and this mail is to seek
> guidance.
> * generation of code. This is critically important as all elements
> have to have classes, and all attributes have to have typed accessors
> and mutators. Although we could use Castor, XMLBeans, etc. for Java
> we have to support Python, C++ and FORTRAN so that I have written our
> own code generator to provide this.
If reusing of code is one of priorities, and extensibility, power for
manipulation of simbolic structures and modularization are also why do not
use a specialized simbolic language as Lisp or Scheme?
The definition of "atoms" of CML is not different from definition of code
in Lisp or Scheme.
Validation of documents is just a class of generic validation of SEXPR.
Similar thoughts about other requirements.
What do you opine about sequence
SXML -> Scheme -> SXML
| |
XML XML modified or validated?
The posibilities for matching and accesing SXML datuments via SXPath
(Scheme extension of XPath) are nice for instance.
> (XSD is good for
> formal documents such as tax-forms but it is poor for the evolution
> of a scientific language).
Yeah, XSD was mainly designed with a bussiness application in mind.
Precisely main strengh of Lisp-like approaches has been its unusual
easiness for adaptation to evolution. The main reason Lisp is so popular
in academic circles and IA research is that code evolutionate with
evolution of the discipline. At least that is my opinion.
I see extremadly difficult that a XML environment (such as is being
designed today) can offer that kind of stuff needed in science. XML comes
from the SGML world of design-once-for-a-fixed-big-bussines.
> * we find little use (at present) for re-usable complexTypes.
> * XSD content models are effectively useless for validation. They
> rapidly become enormous for some elements and no-one would use them.
> * there are many simple relationships that cannot be expressed in XSD.
Difficulty those may be limitations in a full-programming language
environment.
Maybe again chemistry is in the cutting edge of computer science. Somewhat
as in the past some advances were done from chemical informatics research
(you know this topic better than me).
> (If CML requires running marked up text we use <xhtml:div> or similar)
A query, why <div> and not <p>?
> Currently the attributes and content models are used to generate
> code. Thus <propertyList> can have (say) a title attribute, and
> children such as <metadataList> and <property>. This generates code
> such as:
>
> PropertyList.setTitle(String title)
> MetadataList PropertyList.getMetadataList()
> PropertyList.add(Property)
>
> This is enormously valuable when programming as it helps to ensure
> strong typing and provides prompts and checking when writing code.
> Therefore we continue to need a specification that describes the
> relationship of one element to another and, where appropriate,
> supports the generation of code.
Therefore this is an disjointed data + code approach. You could write your
own specification in an unified data + code (i.e. data + data) approach.
> Here are some examples of relationships which I currently need to
> express and which should, if possible, be enforceable in code.
>
> * element must have a parent from (list...)
> * element may have parent from (list...)
> * element must not have parent from list
>
> * element may have children from (list) (and this will generate code)
> * element must not have children from list.
>
> *element may either have a foo attribute or a <foo> child accessible
> through a single getFoo() method
> *element must have either a foo attribute or a bar attribute.
All of this may be easy in SXML. The accessing to childs and attributes is
unified since attributes are just a special list childs in SXML.
> * Values may be required to be distinct. Thus in <foo refs="a1 a2 a3
> a4"/> all values in the list must be distinct. (This sort of thing
> takes half a ;age in schematron)
(foo (@ (refs "a1 a2 a3 a4")))
is a structure easily analizable and modified with powerful programming
methods.
Juan R.
Center for CANONICAL |SCIENCE)
|