xml-dev - Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)

Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
From: Arjun Ray <aray@nyct.net>
Date: Mon, 23 Sep 2002 07:01:57 +0000
In-reply-to: <200209222008.g8MK8v311939@dragon.flightlab.com>
References: <200209212046.49909.miles@milessabin.com> <00a901c2618d$a322b900$0400a8c0@gandalf> <p04330106b9b25d8bd05b@[192.168.254.4]> <200209222008.g8MK8v311939@dragon.flightlab.com>

Joe English <jenglish@flightlab.com> wrote:

| With the right host language, it's often possible to embed a domain-
| specific language using features of host language itself.  Tcl, Haskell, 
| Scheme, Python, and Forth all have good syntactic extensibility.

Python not as much as the others, I would think.  I'd put Python and Perl
on a par here, alhough Python syntax on the whole may be "neater".  The
others, though (Tcl, Haskell, Scheme and Forth) have a fundamental notion
of "proper list" - ie. a tree! - evident in their syntax, which suits them
to the job particularly well.  (IMHO)

| When it comes to even more slightly complex output structures, though, 
| I think they're both a clear win over the explicit node-at-a-time DOM 
| interface and the InnerXml/ document.write() approach:

We all agree that printf-ing is quick and dirty. :-)

The node-at-a-time approach in DOM and its variants, I think, has a lot to
do with doctrinaire notions of correct Java style.  You have to do it that
way when mutators are "supposed" to have void return signatures.  You get
much more mileage if you set things up for method chaining, because method
invocation itself involves the largest contribution of explicit syntactic
clutter - dots, parens, commas, maybe quotes, whatnot - which ultimately
gets in the way of what you want to do, describing a tree economically.

Basically, Java doesn't have List Structure, so you have to fake it. ;-)

|     xml::element HTML {
| 	xml::element HEAD {
| 	    xml::element TITLE {
| 	    	xml::text $Title
| 	    }
| 	    xml::element LINK  type text/css  href stylesheet.css
| 	    xml::element LINK  rel next  href [nextDocument]
| 	    xml::element LINK  rel prev  href [prevDocument]
| 	}
| 	xml::element BODY {
| 	    xml::element P {
| 		xml::text "Hello world!"
| 	    }
| 	}
|     }

Syntaxes of this type fall into a general class that could be described at
a high level with a yacc spec like this

 %token  CO  CC  CS  ELEMSPEC  TEXTSPEC
 %%
 wff
  :  element
  ;
 element
  :  ELEMSPEC  CO  content  CC
  ;
 content
  :  item
  |  item  CS  content
  ;
 item
  :  element
  |  TEXTSPEC
  ;
 %%

The CS (Content Separator) is syntax sugar only; it can be removed without
losing LALR(1) (or LL(1), if things are set up right.)  There are two
states that matter, with the same issue: how to distinguish an element
specification from text content.

 state 4
        element : ELEMSPEC CO . content CC  (2)

        ELEMSPEC  shift 1
        TEXTSPEC  shift 5
        .  error

        element  goto 6
        content  goto 7
        item  goto 8

 state 10
        content : item CS . content  (4)

        ELEMSPEC  shift 1
        TEXTSPEC  shift 5
        .  error

        element  goto 6
        content  goto 11
        item  goto 8


or, without CS:

 state 8
        content : item .  (3)
        content : item . content  (4)

        ELEMSPEC  shift 1
        TEXTSPEC  shift 5
        CC  reduce 3

        element  goto 6
        content  goto 10
        item  goto 8

You get LL(1) with any prefix based approach to tell ELEMSPEC and TEXTSPEC
apart (eg. xml::element and xml::text can be viewed syntactically as just
markers).  If you put text inside quotes always, and quotes are only used
for this purpose, then that's enough too. 

So the only remaining issue is the details of ELEMSPEC.  Here again the
notion of list structure is enough: an element specification is just an
association list (where the element name can be viewed as the "value" of
an unnamed attribute, the "generic identifier"), ie a list of pairs.  Any
list of pairs, in fact!  It could be just an attribute specification list
with the element type implicit (just as all XML can be reduced to the form
<tag element-type="foo" ...>)

Not only are these syntaxes dead easy to parse, they're dead easy to write
too.  With languages like Tcl, etc, you also get the benefits of variable
interpolation and function evaluation to fill slots, but let's not get
into that.

I've always wondered why people prefer syntactic clutter instead.

Follow-Ups:
- Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

References:
- InnerXml is like printf (WAS: Underwhelmed)
  - From: Miles Sabin <miles@milessabin.com>
- RE: [xml-dev] Underwhelmed (WAS: [xml-dev] XOM micro tutorial)
  - From: "Aaron Skonnard" <aarons@develop.com>
- RE: [xml-dev] Underwhelmed (WAS: [xml-dev] XOM micro tutorial)
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
  - From: Joe English <jenglish@flightlab.com>

Prev by Date: Can i specify more than one root element in the specification?
Next by Date: Greenspun's tenth law rears its head: was InnerXml is like printf (WAS: Underwhelmed)
Previous by thread: Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
Next by thread: Re: [xml-dev] InnerXml is like printf (WAS: Underwhelmed)
Index(es):
- Date
- Thread