OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   how to best integrate XML in a programming language

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: how to best integrate XML in a programming language
  • From: Burak Emir <Burak.Emir@epfl.ch>
  • Date: Wed, 18 Feb 2004 11:11:11 +0100
  • User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Hi,

related to my previous announcement ( that lacked the URL 
http://scala.epfl.ch ), I have questions concerning the use of XML in 
programming languages. In order to set the stage, here are three 
suggestions that should connect the XML world to the world of 
general-purpose programming languages like ML, C, C++, Java (using only 
statically typed ones here).

I would be very happy if some experts on this list could give comments 
on the vision described below. Comparable stuff is undertaken by 
Microsoft Research 
(http://www.extremetech.com/article2/0,3973,1441099,00.asp and then 
article from Erik Meijer, Wolfram Schulte and Gavin Bierman which was 
discussed very shortly on this list before), to some extent XQuery. When 
it comes to types, a project again aimed at .NET is described by 
Benjamin Pierce http://www.cis.upenn.edu/~bcpierce/xtatic/outline.html

<point>
No matter whether you are more concerned about documents or data, 
writing code in non-XML syntax makes life easier.
</point>

<point>
No matter whether you are more concerned about documents or data, 
writing XML syntax to describe XML makes life easier.
</point>

<point>
No matter which programming language you prefer, if it cannot deal with 
XML it will soon be forgotten or condemned to accomodate numerical 
computations only.
</point>

The first point should be clear: You want to minimize the number of 
symbols to express e.g. a function call while maintaining readability. 
Mathematical notation like f(x) is shorter then <apply fun="f"><var 
name="x"/></apply>.

The last point is speculation.

The second point demands more reasoning. As an example, consider how 
servlets used to generate HTML pages (and a lot of scripts still do 
similarly):

out.println("<html>");
out.println("<head>");
/* etc */

I believe it is better to write

out.println( <html>... </html>.serialize() );

because a compiler can check many of the well-formedness constraints. It 
is thus less likely that a typo will break your neck at runtime. Plus a 
very sophisticated type system could even check whether your value 
conforms to some type specified in DTD, Relax NG or Schema - this is 
something like "built-in data binding".

In anticipation of several issues that arise when attempting to embed 
XML things in a programming language, here is a non-exhaustive list:

- Where are XML literals allowed ?
    (There must be a clear entry point, at which a language spec links 
to the W3C recommendation. This can be problematic if the language 
interprets symbols like <, or /> as tokens. In XQuery, this problem does 
not arise, because a query language is by definition declarative and 
result-centric... here XML literals, possibly with embedded blocks, are 
just the result of queries )

- How to specify that XML literals may contain code blocks ?
    (XQuery uses an escape mechanism <b> { msg } </b> with braces, where 
msg is a text variable in the current scope. How can one call such an 
XML literal with embedded blocks, is it an XML document, a 
half-document, an XML template, an XML form, or what ?)

- How to deal with entities ?
   (In programming language syntax this would correspond to constants.)

- How to deal with namespaces ?
    (A natural correspondence would be java packages, or C++ namespaces)

- Canonicalization or not ? How ?
    (It is crucial that values in a programming language are unambiguous 
representations. Insignificant whitespace should not make two values 
inequal. After all, we do not want to impose any formatting constraints 
on program code, and XML literals would be part of program code)

Also in the other direction, things can get problematic:

- How does one write comments in XML literals
    (that is an easy one, just use XML comments)

- Which type does the XML literal have ?
    (This demands a class library that represents XML, which is ideally 
not as bloated as DOM)

- How can one navigate in such a XML representation
    (Ideally, an XPath like syntax or something comparable like pattern 
matching is used)


If you want to give it a try how this could feel, download scala from 
http://scala.epfl.ch. I am quite far from having reached definitive 
answers to these questions, however if such a programming language is to 
be useful, good answers need to be found.

cheers,
Burak




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS