OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] how to best integrate XML in a programming language

[ Lists Home | Date Index | Thread Index ]

> I have questions concerning the use of XML in programming languages.
Excellent!  This is an area that I have been working on for more than a 
year!  Our project is called Wheat, and can be found at 
http://www.wheatfarm.org/ -- Wheat is a language designed with several 
deep connections to XML.  My answers here are about how we are handling 
these issues in Wheat.  Since Wheat is in early implementation stage, 
the features in Wheat described here run the gamut from implemented to 
designed to just sketched out.

> In order to set the stage, here are three suggestions that should 
> connect the XML world to the world of general-purpose programming 
> languages like ML, C, C++, Java (using only statically typed ones 
> here).
Wheat is a non-statically typed general-purpose programming language in 
the tradition of Smalltalk, Self, Perl, PHP, Python, etc...  [ I think 
non-statically typed languages are a much better match to XML 
processing, but this issue is more of a design stance than a technical 
argument, so please, no wars over static/non-static. ]

> No matter whether you are more concerned about documents or data,
> writing code in non-XML syntax makes life easier.
Absolutely.  Wheat has an abstract syntax that defines what processing 
in Wheat is.  Then it has three (!) concrete syntaxes:
1 - A Wheat native language that looks like most programming languages. 
  This is what any programmer wants to code in.
2 - An XML based syntax that makes program transformation and 
manipulation easy
3 - A C++ "syntax" that allows one to write code with the same 
semantics as Wheat, but compiled and access to host OS facilities.
The first two forms are isomorphic - you can round trip to and fro 
without any loss of information (including formatting).

> No matter whether you are more concerned about documents or data,
> writing XML syntax to describe XML makes life easier.
Not clear to me.  I don't find XSLT particularly easy or concise to 
write lexically.

> No matter which programming language you prefer, if it cannot deal with
> XML it will soon be forgotten or condemned to accomodate numerical
> computations only.
No comment... this is just tinder!

> out.println( <html>... </html>.serialize() );
>
> because a compiler can check many of the well-formedness constraints. 
> It
> is thus less likely that a typo will break your neck at runtime. Plus a
> very sophisticated type system could even check whether your value
> conforms to some type specified in DTD, Relax NG or Schema - this is
> something like "built-in data binding".
It is unrealistic that a program wants to embed a literal that is an 
entire XML document.  It would be more likely that a program wants to 
output an XML document composed of XML literals and data values from 
the running program.  Therefore, it is highly unlikely that the 
compiler could do much in the way of ensuring schema conformance.

In Wheat, we use templating based approach:  If a program wants to 
generate an XML document of type X, then the programmer creates an 
actual example XML document of type X.  Next the template is marked up 
with elements and attributes from a the template namespace.  These 
define which portions of the document are manipulatable by the running 
program.  The program then calls for the expansion of the template, 
directing the expansion of the marked up sections.  The template 
expander can then ensure that a well-formed XML document results.  
(Yes, even when the program replaces the content of an element with 
arbitrary XML (say an XHTML fragment), the template engine ensures the 
final result is well-formed.)  The output of the expander could be 
schema checked, though we don't at present.

We like this approach because it allows the programmer to author the 
XML template document in a tool that is appropriate to the XML 
application involved.  For example, if it is XHTML, then Dreamweaver 
can be used.

> - Where are XML literals allowed ?
>     (There must be a clear entry point, at which a language spec links
> to the W3C recommendation. This can be problematic if the language
> interprets symbols like <, or /> as tokens. In XQuery, this problem 
> does
> not arise, because a query language is by definition declarative and
> result-centric... here XML literals, possibly with embedded blocks, are
> just the result of queries )
We don't have literals that are XML.  In Wheat there is a close 
isomorphism between a Wheat object and its XML representation.  So, 
instead, Wheat has object literals.  These are always "well formed".  
The object can be turned into XML on demand.

> - How to specify that XML literals may contain code blocks ?
>     (XQuery uses an escape mechanism <b> { msg } </b> with braces, 
> where
> msg is a text variable in the current scope. How can one call such an
> XML literal with embedded blocks, is it an XML document, a
> half-document, an XML template, an XML form, or what ?)
I actually think this is a big mistake.  Frankly, there is little 
difference between:

	print "<html><head><title>";
	print $title;
	print "</title></head><body>";
	for ($i = 1; $i < 10; $i++) {
		print "<p>";
		print $i;
		print "</p>";
	}
	print "</body></html";

and

	<html>
	  <head>
	    <title><?script $title ?></title>
	  </head>
	  <body>
	    <?script for ($i = 1; $i < 10; $i++) { ?>
	      <p><?script $i ?></p>
	    <?script } ?>
	  </body>
	</html>


In fact, as JSP shows, these are completely isomorphic.  Either 
representation is neither beast nor fowl.  We all dislike the first 
representation as it clearly leads to code that brittle, and it is 
extremely hard to see and modify the XML document.  The second mearly 
swaps these problems: the XML is easier to work with (assuming you have 
an XML editor that can pick around the script blocks, especially when 
they appear in the middle of attribute values (!)), at the expense of 
the code being opaque.  Further, in real applications that use the 
second form (I've written some mighty big PHP scripts...) the document 
can easily end up not even being XML well-formed!

It is this very problem that lead Wheat to use templates.  With 
templates there is clear separation: The XML document looks, feels and 
is authored as XML.  The code is clear, concise and compact.  The trick 
is making sure that the code one writes to connect the two is easy and 
clear.

> - How to deal with entities ?
>    (In programming language syntax this would correspond to constants.)
Well, except for &amp;, &lt;, &gt; and &quot; I think they are best 
banished...  :-)  I am loath to encourage them by providing language 
features to support the general issues of entities.  (Yes, yes, I 
recognize their utility in DTDs...)

> - How to deal with namespaces ?
>     (A natural correspondence would be java packages, or C++ 
> namespaces)
I see the same natural relationship.  Since Wheat is non-statically 
typed, the object model allows a further correspondence in how 
namespaces can be used in XML:  In Wheat it is possible for one module 
to add instance variables to an object that inherits from another.  
Sort of like a annotation.  The base class will happily ignore the 
instance variables that come from another module.

> - Canonicalization or not ? How ?
In Wheat this isn't an issue I think.  Since when code is running it is 
concerned with the semantic meaning of the XML, (i.e. we convert it to 
a tree of objects) we don't care so much about comparing the 
whitespace.

> - How does one write comments in XML literals
>     (that is an easy one, just use XML comments)
As we said, there are no XML literals.  Of course, the XML templates 
can have XML comments in them - and they are (or can be) preserved on 
output.

> - Which type does the XML literal have ?
>     (This demands a class library that represents XML, which is ideally
> not as bloated as DOM)
Well, in Wheat, the XML/object dualism takes one of three forms:
1- Canonical Wheat XML.  This is just an XML format that any tree of 
Wheat objects can be written out as.  It captures the Wheat semantics 
exactly.
2- DOM-like tree.  This is a set of Wheat objects and classes that 
exactly capture the semantics of any arbitrary XML document.  It will 
be DOM-like, but we'll probably roll our own to closely match the 
object and processing facilities in Wheat.
3- Programmer specified.  Here, the programmer supplies the Wheat 
objects and classes that are used to represent a specific XML 
application.  Presumably (if the programmer did the job correctly) the 
semantics between the XML and objects are identical.  This is similar 
to the data binding systems out there, though it is bi-directional.

> - How can one navigate in such a XML representation
>     (Ideally, an XPath like syntax or something comparable like pattern
> matching is used)
We always navigate one of the object representations above.  XPath is 
reflected as an object tree pattern matching library.

	- Mark

Mark Lentczner
markl@wheatfarm.org
http://www.wheatfarm.org/





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS