[
Lists Home |
Date Index |
Thread Index
]
> I have questions concerning the use of XML in programming languages.
Excellent! This is an area that I have been working on for more than a
year! Our project is called Wheat, and can be found at
http://www.wheatfarm.org/ -- Wheat is a language designed with several
deep connections to XML. My answers here are about how we are handling
these issues in Wheat. Since Wheat is in early implementation stage,
the features in Wheat described here run the gamut from implemented to
designed to just sketched out.
> In order to set the stage, here are three suggestions that should
> connect the XML world to the world of general-purpose programming
> languages like ML, C, C++, Java (using only statically typed ones
> here).
Wheat is a non-statically typed general-purpose programming language in
the tradition of Smalltalk, Self, Perl, PHP, Python, etc... [ I think
non-statically typed languages are a much better match to XML
processing, but this issue is more of a design stance than a technical
argument, so please, no wars over static/non-static. ]
> No matter whether you are more concerned about documents or data,
> writing code in non-XML syntax makes life easier.
Absolutely. Wheat has an abstract syntax that defines what processing
in Wheat is. Then it has three (!) concrete syntaxes:
1 - A Wheat native language that looks like most programming languages.
This is what any programmer wants to code in.
2 - An XML based syntax that makes program transformation and
manipulation easy
3 - A C++ "syntax" that allows one to write code with the same
semantics as Wheat, but compiled and access to host OS facilities.
The first two forms are isomorphic - you can round trip to and fro
without any loss of information (including formatting).
> No matter whether you are more concerned about documents or data,
> writing XML syntax to describe XML makes life easier.
Not clear to me. I don't find XSLT particularly easy or concise to
write lexically.
> No matter which programming language you prefer, if it cannot deal with
> XML it will soon be forgotten or condemned to accomodate numerical
> computations only.
No comment... this is just tinder!
> out.println( <html>... </html>.serialize() );
>
> because a compiler can check many of the well-formedness constraints.
> It
> is thus less likely that a typo will break your neck at runtime. Plus a
> very sophisticated type system could even check whether your value
> conforms to some type specified in DTD, Relax NG or Schema - this is
> something like "built-in data binding".
It is unrealistic that a program wants to embed a literal that is an
entire XML document. It would be more likely that a program wants to
output an XML document composed of XML literals and data values from
the running program. Therefore, it is highly unlikely that the
compiler could do much in the way of ensuring schema conformance.
In Wheat, we use templating based approach: If a program wants to
generate an XML document of type X, then the programmer creates an
actual example XML document of type X. Next the template is marked up
with elements and attributes from a the template namespace. These
define which portions of the document are manipulatable by the running
program. The program then calls for the expansion of the template,
directing the expansion of the marked up sections. The template
expander can then ensure that a well-formed XML document results.
(Yes, even when the program replaces the content of an element with
arbitrary XML (say an XHTML fragment), the template engine ensures the
final result is well-formed.) The output of the expander could be
schema checked, though we don't at present.
We like this approach because it allows the programmer to author the
XML template document in a tool that is appropriate to the XML
application involved. For example, if it is XHTML, then Dreamweaver
can be used.
> - Where are XML literals allowed ?
> (There must be a clear entry point, at which a language spec links
> to the W3C recommendation. This can be problematic if the language
> interprets symbols like <, or /> as tokens. In XQuery, this problem
> does
> not arise, because a query language is by definition declarative and
> result-centric... here XML literals, possibly with embedded blocks, are
> just the result of queries )
We don't have literals that are XML. In Wheat there is a close
isomorphism between a Wheat object and its XML representation. So,
instead, Wheat has object literals. These are always "well formed".
The object can be turned into XML on demand.
> - How to specify that XML literals may contain code blocks ?
> (XQuery uses an escape mechanism <b> { msg } </b> with braces,
> where
> msg is a text variable in the current scope. How can one call such an
> XML literal with embedded blocks, is it an XML document, a
> half-document, an XML template, an XML form, or what ?)
I actually think this is a big mistake. Frankly, there is little
difference between:
print "<html><head><title>";
print $title;
print "</title></head><body>";
for ($i = 1; $i < 10; $i++) {
print "<p>";
print $i;
print "</p>";
}
print "</body></html";
and
<html>
<head>
<title><?script $title ?></title>
</head>
<body>
<?script for ($i = 1; $i < 10; $i++) { ?>
<p><?script $i ?></p>
<?script } ?>
</body>
</html>
In fact, as JSP shows, these are completely isomorphic. Either
representation is neither beast nor fowl. We all dislike the first
representation as it clearly leads to code that brittle, and it is
extremely hard to see and modify the XML document. The second mearly
swaps these problems: the XML is easier to work with (assuming you have
an XML editor that can pick around the script blocks, especially when
they appear in the middle of attribute values (!)), at the expense of
the code being opaque. Further, in real applications that use the
second form (I've written some mighty big PHP scripts...) the document
can easily end up not even being XML well-formed!
It is this very problem that lead Wheat to use templates. With
templates there is clear separation: The XML document looks, feels and
is authored as XML. The code is clear, concise and compact. The trick
is making sure that the code one writes to connect the two is easy and
clear.
> - How to deal with entities ?
> (In programming language syntax this would correspond to constants.)
Well, except for &, <, > and " I think they are best
banished... :-) I am loath to encourage them by providing language
features to support the general issues of entities. (Yes, yes, I
recognize their utility in DTDs...)
> - How to deal with namespaces ?
> (A natural correspondence would be java packages, or C++
> namespaces)
I see the same natural relationship. Since Wheat is non-statically
typed, the object model allows a further correspondence in how
namespaces can be used in XML: In Wheat it is possible for one module
to add instance variables to an object that inherits from another.
Sort of like a annotation. The base class will happily ignore the
instance variables that come from another module.
> - Canonicalization or not ? How ?
In Wheat this isn't an issue I think. Since when code is running it is
concerned with the semantic meaning of the XML, (i.e. we convert it to
a tree of objects) we don't care so much about comparing the
whitespace.
> - How does one write comments in XML literals
> (that is an easy one, just use XML comments)
As we said, there are no XML literals. Of course, the XML templates
can have XML comments in them - and they are (or can be) preserved on
output.
> - Which type does the XML literal have ?
> (This demands a class library that represents XML, which is ideally
> not as bloated as DOM)
Well, in Wheat, the XML/object dualism takes one of three forms:
1- Canonical Wheat XML. This is just an XML format that any tree of
Wheat objects can be written out as. It captures the Wheat semantics
exactly.
2- DOM-like tree. This is a set of Wheat objects and classes that
exactly capture the semantics of any arbitrary XML document. It will
be DOM-like, but we'll probably roll our own to closely match the
object and processing facilities in Wheat.
3- Programmer specified. Here, the programmer supplies the Wheat
objects and classes that are used to represent a specific XML
application. Presumably (if the programmer did the job correctly) the
semantics between the XML and objects are identical. This is similar
to the data binding systems out there, though it is bi-directional.
> - How can one navigate in such a XML representation
> (Ideally, an XPath like syntax or something comparable like pattern
> matching is used)
We always navigate one of the object representations above. XPath is
reflected as an object tree pattern matching library.
- Mark
Mark Lentczner
markl@wheatfarm.org
http://www.wheatfarm.org/
|