[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML and ad-hoc syntax (was: Re: [xml-dev] Please stop writingspecifications that cannot be parsed/processed by software)
- From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- To: Norm Tovey-Walsh <ndw@nwalsh.com>
- Date: Mon, 05 Jun 2023 07:30:48 -0600
I second most of what Norm said in his mail, and want to comment further
on one point.
Norm Tovey-Walsh <ndw@nwalsh.com> writes:
> Dimitre Novatchev <dnovatchev@gmail.com> writes:
>> ...
>> I envy GitHub authors who only have to use MD, and can easily produce
>> stunning documents.
>
> ...
>
> If you’re willing to invent arbitrary amounts of ad hoc syntax, and edit
> that syntax in a text editor with no understanding of the syntax (or
> write a customized editor, I suppose), it’s probably possible to design
> a Markdown-style syntax that would capture the structure of, for
> example, the QT specifications, but *BOY* it would not be pretty. (If
> you think I’m mistaken, I invite you to propose a MD style grammar that
> will capture the information necessary to generate them. You get zero
> credit for 80% of the job. The first 80% is easy. It’s a zero-sum
> challenge, succeed or fail, there is no try.)
In connection with the choice between inventing an ad hoc syntax and
just using XML, I offer two data points. (Or three, I guess.) For what
they are worth.
- During the development of XSLT 2.0 and 3.0 and XQuery 1.0 and 3.*,
it was noticeable that every time new functionality was added to the
design, the course of the discussion depended a lot on whether the
new functionality was to be part of XPath, or XQuery, or XSLT.
If the new functionality was going into XSLT, almost all the time on
an issue went to discussing how the functionality should work.
Extending the syntax of the language to express the new
functionality never took any appreciable amount of time, because
adding a new attribute or element never risked introducing ambiguity
or lookahead problems. Of course, deciding what attributes and
elements to add or change required some care and thought, but it
never became the kind of roadblock that it routinely became in the
XQuery WG.
For XPath or XQuery, the discussion of functionality would take the
same amount of time as in XSLT, and then additional time would be
needed to work out the syntax of the new functionality -- maybe the
same amount of time again, maybe as little as half the time spent on
functionality. It was not unusual to have to iterate multiple
times, because the WG members who maintained the grammar would
report back that the current syntax proposal was incurably ambiguous
or otherwise problematic. (And every now and then the WG would
shoe-horn the problematic syntax in anyway, by adding a new ad hoc
rule for the tokenizer. The result is, for connoisseurs of formal
grammars, a bit of a mess.)
- A year or two ago, some of those working on invisible XML discussed
how to build test suites for ixml. We discussed whether to base our
work on some existing test framework with a non-XML syntax (I can't
remember the name) or write the test catalog in XML. There was some
sentiment for starting from the non-XML syntax -- after all, this is
invisible XML, so we can turn it into XML whenever we want, right?
The first problem came when I tried to write my first test catalog
using the non-XML syntax. The existing system had very sparse
metadata and had no hooks for any of the kind of information I think
is helpful for communally maintained test suites (like: a change
history). I tried to figure out a nifty way to extend the existing
non-XML syntax to handle the information needed, and after a few
hours of banging my head against the wall I gave up and wrote the
catalog in XML. Figuring out the XML representation of the test
catalog then involved mostly thinking about the kind of information
needed and its structure, and a little bit of thinking about what to
call things and how things should nest. No one has ever said "Oh,
but wouldn't it be nicer to have a non-XML syntax for the catalog?"
In both of these cases, non-XML syntaxes proved harder to work with than
XML syntaxes.
That's not always the case, surely: many people whose goals for their
documents are limited to getting ink on paper or lighting up pixels on
screens are happy with Markdown. And for some complicated information
structures, I think designing an ixml grammar can be helpful -- indeed,
at Balisage this year I will be reporting on one such case. So let's
add a third data point:
- When I was transcribing Gottlob Frege's 1879 book on 'concept
notation' (Begriffsschrift), I experimented briefly with
representing his two-dimensional logical notation using an XML
syntax. One might, for example, use something like SVG to talk
about the two-dimensional shapes of the formulas. Or one might use
an XML syntax for logic to represent the logical structure expressed
by the formuas. But I found that transcribing even a relatively
simple formula involved an awful lot of machinery. And once I
actually understood Frege's notation reasonably well, I could see
that a simple, easily keyboardable syntax could be devised which
would make it easier to capture his visually and logically complex
structures. So I devised such a keyboardable syntax and wrote an
ixml grammar for it, and the transcription proceeded without
incident.
It would have been better if there had been an easy way to get
syntax support in the editor, to detect syntax problems in the
transcribed formulas. Detecting them by parsing the formulas to XML
and then converting them to SVG took longer than detecting an XML
validity error in a schema-aware editor. But on the whole I think
the use of ixml was a big success here.
I am not completely sure what factors make XML syntax better in one case
and non-XML syntax in another.
My guess is that one factor is that Frege's notation really is quite
specialized. So there are a lot of things that may happen in the
universe of technical writing, logic, or mathematics which my ixml
grammar does not need to handle. All of that is handled by the
surrounding XML: the non-XML syntax is using only within 'formula'
elements. (Of course, reality sometimes bites back: in reality, there
are several cases in the book which require text-critical markup to
record variant readings in different editions of Frege's book. For
that, there is no easy non-XML markup.)
Another factor is the stability of the target: extending an XML
vocabulary is easy, and extending a non-XML syntax is hard, partly
because we have very few good tools for managing changes in a grammar.
So it was the addition of new functionality, or new metadata
requirements, that made non-XML syntaxes so much work in the first two
cases, and the stability of the target that made ixml work well in the
third case: Frege's notation in the 1879 book is not going to change any
more, so the case of having to add new functionality is just not going
to arise.
And, of course, the ixml case works well in part (at least for me)
because what the parser produces is XML, which I can process using a
fairly capable technology stack.
I'm curious about other people's experiences. As always, YMMV.
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]