[
Lists Home |
Date Index |
Thread Index
]
Joe English <jenglish@flightlab.com> wrote:
| With the right host language, it's often possible to embed a domain-
| specific language using features of host language itself. Tcl, Haskell,
| Scheme, Python, and Forth all have good syntactic extensibility.
Python not as much as the others, I would think. I'd put Python and Perl
on a par here, alhough Python syntax on the whole may be "neater". The
others, though (Tcl, Haskell, Scheme and Forth) have a fundamental notion
of "proper list" - ie. a tree! - evident in their syntax, which suits them
to the job particularly well. (IMHO)
| When it comes to even more slightly complex output structures, though,
| I think they're both a clear win over the explicit node-at-a-time DOM
| interface and the InnerXml/ document.write() approach:
We all agree that printf-ing is quick and dirty. :-)
The node-at-a-time approach in DOM and its variants, I think, has a lot to
do with doctrinaire notions of correct Java style. You have to do it that
way when mutators are "supposed" to have void return signatures. You get
much more mileage if you set things up for method chaining, because method
invocation itself involves the largest contribution of explicit syntactic
clutter - dots, parens, commas, maybe quotes, whatnot - which ultimately
gets in the way of what you want to do, describing a tree economically.
Basically, Java doesn't have List Structure, so you have to fake it. ;-)
| xml::element HTML {
| xml::element HEAD {
| xml::element TITLE {
| xml::text $Title
| }
| xml::element LINK type text/css href stylesheet.css
| xml::element LINK rel next href [nextDocument]
| xml::element LINK rel prev href [prevDocument]
| }
| xml::element BODY {
| xml::element P {
| xml::text "Hello world!"
| }
| }
| }
Syntaxes of this type fall into a general class that could be described at
a high level with a yacc spec like this
%token CO CC CS ELEMSPEC TEXTSPEC
%%
wff
: element
;
element
: ELEMSPEC CO content CC
;
content
: item
| item CS content
;
item
: element
| TEXTSPEC
;
%%
The CS (Content Separator) is syntax sugar only; it can be removed without
losing LALR(1) (or LL(1), if things are set up right.) There are two
states that matter, with the same issue: how to distinguish an element
specification from text content.
state 4
element : ELEMSPEC CO . content CC (2)
ELEMSPEC shift 1
TEXTSPEC shift 5
. error
element goto 6
content goto 7
item goto 8
state 10
content : item CS . content (4)
ELEMSPEC shift 1
TEXTSPEC shift 5
. error
element goto 6
content goto 11
item goto 8
or, without CS:
state 8
content : item . (3)
content : item . content (4)
ELEMSPEC shift 1
TEXTSPEC shift 5
CC reduce 3
element goto 6
content goto 10
item goto 8
You get LL(1) with any prefix based approach to tell ELEMSPEC and TEXTSPEC
apart (eg. xml::element and xml::text can be viewed syntactically as just
markers). If you put text inside quotes always, and quotes are only used
for this purpose, then that's enough too.
So the only remaining issue is the details of ELEMSPEC. Here again the
notion of list structure is enough: an element specification is just an
association list (where the element name can be viewed as the "value" of
an unnamed attribute, the "generic identifier"), ie a list of pairs. Any
list of pairs, in fact! It could be just an attribute specification list
with the element type implicit (just as all XML can be reduced to the form
<tag element-type="foo" ...>)
Not only are these syntaxes dead easy to parse, they're dead easy to write
too. With languages like Tcl, etc, you also get the benefits of variable
interpolation and function evaluation to fill slots, but let's not get
into that.
I've always wondered why people prefer syntactic clutter instead.
|