xml-dev - Re: [xml-dev] how to best integrate XML in a programming language

Re: [xml-dev] how to best integrate XML in a programming language

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] how to best integrate XML in a programming language
From: james anderson <james.anderson@setf.de>
Date: Thu, 19 Feb 2004 14:24:15 +0100
In-reply-to: <40333A3F.3090201@epfl.ch>

On Wednesday, Feb 18, 2004, at 11:11 Europe/Berlin, Burak Emir wrote:

> Hi,
>
> related to my previous announcement ( that lacked the URL 
> http://scala.epfl.ch ), I have questions concerning the use of XML in 
> programming languages. ...
...
>
> In anticipation of several issues that arise when attempting to embed 
> XML things in a programming language, here is a non-exhaustive list:
>
> - Where are XML literals allowed ?

[hmm... do you mean expressions encoded in xml syntax?] at least 
wherever the base language permits value expressions. if the language 
distinguishes statements from expressions, then likely also statements 
which do not effect a control transfer.

>    (There must be a clear entry point, at which a language spec links 
> to the W3C recommendation. This can be problematic if the language 
> interprets symbols like <, or /> as tokens. In XQuery, this problem 
> does not arise, because a query language is by definition declarative 
> and result-centric... here XML literals, possibly with embedded 
> blocks, are just the result of queries )

if the language permits the program to adapt the surface syntax, it is 
possible to embed xml seamlessly as forms which express both control 
and data construction. for example, one of the definitions in the 
xml-rpc example for cl-xml illustrates both literal and abstract 
embedding in a single function.

(defun serve-method-call (uri input-source output-destination
                                        &key ((:xml-rpc-package 
*xml-rpc-package*)
                                              (or (find-package uri) 
(error "package not found: ~s." uri)))
                                        (namespace-bindings `(("" . 
,*xml-rpc-package*))))
   (handler-case
     (let* ((document (document-parser input-source :ignore-whitespace 
t))
            (result nil))
           (handler-bind ((error (lambda (c) (break "error: ~a" c))))
             (let ((*namespace-bindings* namespace-bindings))
               (setf result (perform-method-call (root document)))))
       (with-xml-writer (output-destination)
         (xml* {}methodResponse                                          
; here as s-expressions
               (xml* {}params
                     (xml* {}param
                           (xml* {}value
                                 (xml-rpc-encode-value result)))))))
     (error (condition)
            (with-xml-writer (output-destination)
              (xml <methodResponse>                                      
; here as s-expression/xml hybrid
                   <fault><value>
                   (xml-rpc-encode-value
                    `(({}faultCode . ,(get-universal-time))
                      ({}faultString . ,(format nil "~a" condition))))
                   </value></fault></methodResponse> )))))

the effect is analogous to that described in [1], just embedded in an 
existing language.

>
> - How to specify that XML literals may contain code blocks ?

in the case above, the reader recognizes ('<' ( namechar | '/'))  in 
lisp code as the start of xml and and an initial '('  in character data 
as the start of lisp. the macro-expansion looks at the forms and 
distinguishes between xml and lisp based on the datum in the operator 
position. it works out. if one retrieves the page with the example and 
looks at the source, the mechanism is clear.

it is more of an issue to allow a given form to be executed for either 
side-effect (that is serialization) or for the result depending on 
run-time state.

>    (XQuery uses an escape mechanism <b> { msg } </b> with braces, 
> where msg is a text variable in the current scope. How can one call 
> such an XML literal with embedded blocks, is it an XML document, a 
> half-document, an XML template, an XML form, or what ?)

i have observed the distinction between '<" and '(' to suffice in the 
context of lisp syntax. should people start doing fancy things with 
names, it would become necessary to recognize '{' as well.

>
> - How to deal with entities ?
>   (In programming language syntax this would correspond to constants.)

i not certain what you mean. i don't think you are referring to 
character entities, and i would have thought that parsed entities 
either correspond to variables - if the intent is that the be expanded 
when serialized, or are afforded a data type and values just as names 
are in order that the serialization layer can emit the correct 
definitions and instance encoding. ?

>
> - How to deal with namespaces ?
>    (A natural correspondence would be java packages, or C++ namespaces)

should this find use in server environments, it is not clear that there 
is an advantage to exposing language constructs - in that case 
packages, to clients. i would also wonder whether java's introspection 
facilities are up to it.

>
> - Canonicalization or not ? How ?
>    (It is crucial that values in a programming language are 
> unambiguous representations. Insignificant whitespace should not make 
> two values inequal. After all, we do not want to impose any formatting 
> constraints on program code, and XML literals would be part of program 
> code)

i wonder, as your hint seems to be that canonicalization would 
facilitate equality tests, whether, in general, the comparison protocol 
needs to accommodate variations. while an xml processor does have some 
leeway in reporting document content, it would be treading on thin ice 
if it canonicalized all input on a default basis. given which, unless 
the only use case is output and canonical output is the default 
serialization form, i would think that the cost outweighs the benefit. 
wouldn't it be better to support structural type equivalence and then 
define instance equality based on that?

>
> Also in the other direction, things can get problematic:
>
> - How does one write comments in XML literals

see above. it depends on whether the comments pertain to the xml as 
content or to its generation, transformation, or serialization. the 
syntax should support both forms.

>    (that is an easy one, just use XML comments)
>
> - Which type does the XML literal have ?
>    (This demands a class library that represents XML, which is ideally 
> not as bloated as DOM)

in a lisp environment, there are advantages to representing the 
xml-related expressions with s-expressions. as that data form is the 
compiler's initial concrete model for the program, it affords the 
programmer unencumbered access to lisp's program transformation 
mechanisms. while limiting the immediate data type does limit the 
ability to perform static type analysis, it does not preclude it.

>
> - How can one navigate in such a XML representation
>    (Ideally, an XPath like syntax or something comparable like pattern 
> matching is used)

s-expressions are amenable to all sorts of navigation mechanisms - 
xpath, pattern-based, and procedural expressions among them.

...

[0] http://www.cl-xml.org/documentation/howto/index.html
[1] http://www.cl.cam.ac.uk/~gmb/Papers/vanilla-xml2003.html

References:
- how to best integrate XML in a programming language
  - From: Burak Emir <Burak.Emir@epfl.ch>

Prev by Date: Re: [xml-dev] XML publishing frameworks and software methodologies
Next by Date: ANN: XML Europe 2004 Programme now available, register today forearly bird rates
Previous by thread: [xml-dev] virus in the list
Next by thread: JSR 206 and SAX
Index(es):
- Date
- Thread