XML and ad-hoc syntax (was: Re: [xml-dev] Please stop writingspecificati

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML and ad-hoc syntax (was: Re: [xml-dev] Please stop writingspecifications that cannot be parsed/processed by software)

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Norm Tovey-Walsh <ndw@nwalsh.com>
Date: Mon, 05 Jun 2023 07:30:48 -0600

I second most of what Norm said in his mail, and want to comment further
on one point.

Norm Tovey-Walsh <ndw@nwalsh.com> writes:

> Dimitre Novatchev <dnovatchev@gmail.com> writes:
>> ...
>> I envy GitHub authors who only have to use MD, and can easily produce
>> stunning documents.
>
> ...
>
> If you’re willing to invent arbitrary amounts of ad hoc syntax, and edit
> that syntax in a text editor with no understanding of the syntax (or
> write a customized editor, I suppose), it’s probably possible to design
> a Markdown-style syntax that would capture the structure of, for
> example, the QT specifications, but *BOY* it would not be pretty. (If
> you think I’m mistaken, I invite you to propose a MD style grammar that
> will capture the information necessary to generate them. You get zero
> credit for 80% of the job. The first 80% is easy. It’s a zero-sum
> challenge, succeed or fail, there is no try.)

In connection with the choice between inventing an ad hoc syntax and
just using XML, I offer two data points.  (Or three, I guess.) For what
they are worth.

  - During the development of XSLT 2.0 and 3.0 and XQuery 1.0 and 3.*,
    it was noticeable that every time new functionality was added to the
    design, the course of the discussion depended a lot on whether the
    new functionality was to be part of XPath, or XQuery, or XSLT.

    If the new functionality was going into XSLT, almost all the time on
    an issue went to discussing how the functionality should work.
    Extending the syntax of the language to express the new
    functionality never took any appreciable amount of time, because
    adding a new attribute or element never risked introducing ambiguity
    or lookahead problems.  Of course, deciding what attributes and
    elements to add or change required some care and thought, but it
    never became the kind of roadblock that it routinely became in the
    XQuery WG.

    For XPath or XQuery, the discussion of functionality would take the
    same amount of time as in XSLT, and then additional time would be
    needed to work out the syntax of the new functionality -- maybe the
    same amount of time again, maybe as little as half the time spent on
    functionality.  It was not unusual to have to iterate multiple
    times, because the WG members who maintained the grammar would
    report back that the current syntax proposal was incurably ambiguous
    or otherwise problematic.  (And every now and then the WG would
    shoe-horn the problematic syntax in anyway, by adding a new ad hoc
    rule for the tokenizer. The result is, for connoisseurs of formal
    grammars, a bit of a mess.)

  - A year or two ago, some of those working on invisible XML discussed
    how to build test suites for ixml.  We discussed whether to base our
    work on some existing test framework with a non-XML syntax (I can't
    remember the name) or write the test catalog in XML.  There was some
    sentiment for starting from the non-XML syntax -- after all, this is
    invisible XML, so we can turn it into XML whenever we want, right?

    The first problem came when I tried to write my first test catalog
    using the non-XML syntax.  The existing system had very sparse
    metadata and had no hooks for any of the kind of information I think
    is helpful for communally maintained test suites (like: a change
    history).  I tried to figure out a nifty way to extend the existing
    non-XML syntax to handle the information needed, and after a few
    hours of banging my head against the wall I gave up and wrote the
    catalog in XML.  Figuring out the XML representation of the test
    catalog then involved mostly thinking about the kind of information
    needed and its structure, and a little bit of thinking about what to
    call things and how things should nest.  No one has ever said "Oh,
    but wouldn't it be nicer to have a non-XML syntax for the catalog?"

In both of these cases, non-XML syntaxes proved harder to work with than
XML syntaxes.

That's not always the case, surely: many people whose goals for their
documents are limited to getting ink on paper or lighting up pixels on
screens are happy with Markdown.  And for some complicated information
structures, I think designing an ixml grammar can be helpful -- indeed,
at Balisage this year I will be reporting on one such case.  So let's
add a third data point:

  - When I was transcribing Gottlob Frege's 1879 book on 'concept
    notation' (Begriffsschrift), I experimented briefly with
    representing his two-dimensional logical notation using an XML
    syntax.  One might, for example, use something like SVG to talk
    about the two-dimensional shapes of the formulas.  Or one might use
    an XML syntax for logic to represent the logical structure expressed
    by the formuas.  But I found that transcribing even a relatively
    simple formula involved an awful lot of machinery.  And once I
    actually understood Frege's notation reasonably well, I could see
    that a simple, easily keyboardable syntax could be devised which
    would make it easier to capture his visually and logically complex
    structures.  So I devised such a keyboardable syntax and wrote an
    ixml grammar for it, and the transcription proceeded without
    incident.

    It would have been better if there had been an easy way to get
    syntax support in the editor, to detect syntax problems in the
    transcribed formulas.  Detecting them by parsing the formulas to XML
    and then converting them to SVG took longer than detecting an XML
    validity error in a schema-aware editor.  But on the whole I think
    the use of ixml was a big success here.

I am not completely sure what factors make XML syntax better in one case
and non-XML syntax in another.

My guess is that one factor is that Frege's notation really is quite
specialized.  So there are a lot of things that may happen in the
universe of technical writing, logic, or mathematics which my ixml
grammar does not need to handle.  All of that is handled by the
surrounding XML: the non-XML syntax is using only within 'formula'
elements.  (Of course, reality sometimes bites back: in reality, there
are several cases in the book which require text-critical markup to
record variant readings in different editions of Frege's book.  For
that, there is no easy non-XML markup.)

Another factor is the stability of the target: extending an XML
vocabulary is easy, and extending a non-XML syntax is hard, partly
because we have very few good tools for managing changes in a grammar.
So it was the addition of new functionality, or new metadata
requirements, that made non-XML syntaxes so much work in the first two
cases, and the stability of the target that made ixml work well in the
third case: Frege's notation in the 1879 book is not going to change any
more, so the case of having to add new functionality is just not going
to arise.

And, of course, the ixml case works well in part (at least for me)
because what the parser produces is XML, which I can process using a
fairly capable technology stack.

I'm curious about other people's experiences.  As always, YMMV.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

References:
- Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
  - From: Dimitre Novatchev <dnovatchev@gmail.com>
- Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
  - From: Norm Tovey-Walsh <ndw@nwalsh.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]