Re: [xml-dev] Invisible XML 1.0 spec has been published

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Roger L Costello <costello@mitre.org>
Date: Fri, 10 Jun 2022 14:12:37 -0600
Roger L Costello writes:

> Playing devil's advocate here ...

> Invisible XML is all about converting a non-XML data format to
> XML. There are two problems with that:

> 1. There already exists a standard technology, with Apache open source
> tooling, to do that. It's called Data Format Description Language
> (DFDL). As best as I can tell, DFDL is 100 times more powerful than
> Invisible XML. Plus, DFDL builds on top of existing XML technologies
> -- XML Schema and XPath. Why reinvent the wheel, a weaker wheel at
> that?

DFDL is indeed another approach to the converstion of non-XML data into
XML, and those whose needs are met by it should continue to use it in
peace with all my best wishes.  I have never used it myself, partly
because when I looked I could find no implementations I could actually
make use of (perhaps I was looking in the wrong places) and partly
because its motivating use cases seemed very different from the kinds of
non-XML data I am most often concerned with and for which I expect ixml
to provide convenient solutions.

I'm a little puzzled by your claim that it is "100 times more powerful",
since I don't know any good way to measure the power of a transformation
tool quantitatively.  Can you explain what scale you are measuring on,
and what the actual values are?

For that matter, the claim that DFDL is "more powerful" than Invisible
XML would be puzzling even without the bogus quantification; as far as I
know, each has some capabilities not present in the other:

  - DFDL is designed to work with binary as well as textual data; ixml
    is clearly tilted towards textual data.

    That's actually an oversimplification; ixml is not restricted to XML
    characters for input, so one could in principle write an ixml
    grammar to parse binary data, but the focus of most people
    interested in ixml has been and is likely to continue to be textual
    data: things like CSS stylesheets or XPath expressions or program
    source code or wiki-like markup or bibliographic data or ...

    So, DFDL may possibly be capable of describing non-XML notations
    that cannot be described in XML.  Or maybe not.

  - DFDL specifications are (if I understand correctly) written as
    annotations in XSD schema documents; Invisible XML grammars can be
    written either in the non-XML syntax described in the specification
    or in equivalent XML.

    This is not a question of expressive power, but it does have a good
    deal of relevance for the question "why invent a new way to do
    this?"  Why invent an alternative to writing a grammar as
    annotations in XSD schema documents?  I suspect many users of XML
    may think that that question answers itself.

  - There is presumably a relation of some kind between the concepts of
    DFDL and those of context-free grammars as known to computer
    science, but I think it's fair to say that clarifying that relation
    is not a pressing concern for the authors of the DFDL specification.

    As far as I know, DFDL assumes a recursive-descent parsing model
    with backtracking and unbounded lookahead.  That suggests that
    grammars need not be LL(1), or LL(k) for any finite k, but also
    suggests that left-recursive grammars are going to cause problems;
    if any readers of this mail know DFDL well enough to comment, I
    would be glad to know more.

    Invisible XML processors accept arbitrary context-free grammars
    written in the ixml notation or the equivalent XML, without
    any substantive restrictions.

    The relation of this point to questions of expressive power is a bit
    subtle.  Can all grammars which use left recursion be rewritten
    without left recursion?  Yes, of course they can: just translate the
    grammar into Greibach Normal Form.  Since in GNF no right-hand side
    begins with a non-terminal symbol, no GNF grammar is
    left-recursive. Q.E.D.

    On this point, it would seem likely that Invisible XML processors
    are likely to be able to handle a large number of grammars with
    which a DFDL processor will have some difficulties.  
    
What makes ixml attractive to some people is not power for its own sake
or in the abstract, but convenience.

The main design rationale for requiring ixml processors to handle
arbitrary grammars is not expressive power for its own sake but
convenience for the grammar writer.  (Many people who work with grammars
value the ability to write a grammar in what seems the most 'natural'
way, without the need to rewrite it then to be unambiguous, or LL(1), or
LALR(1), or to be in Greibach Normal Form.)

Convenience is of course in the eye of the beholder.  Those attracted by
the opportunity to describe a non-XML notation by means of annotations
on an XSD schema describing the XML format into whch the non-XML data
should be mapped will perhaps be unimpressed by the opportunity to
describe it using instead a simple context-free grammar.  Other people
may (how can I say this delicately but clearly?) prefer not to spend
time reading and editing XSD schema documents. They may think it is
better to describe context-free languages using conventional
context-free grammars, and to leave XML schema languages out of the
equation.


> 2. From my cursory skimming of the Invisible XML specification, it
> appears that a user has to create a parser for the non-XML data
> format. If I have to spend the time to create a parser for a data
> format, I probably will also spend a bit more time writing code to
> process the parsed data directly. No need to introduce an additional
> step to convert to XML.

Then, if I may say so without offense, you may need to improve either
your approach to skimming unfamiliar material, or the precision of your
terminology.

If by "create a parser" you mean write a program, in some programming
language, which can parse sentences of a particular language, then
nothing could be further from the truth.  One of the core ideas of
invisible XML is to enable users to parse sentences of a given language
without writing a parser.  The user of an invisible-XML processor
supplies a grammar and an input string and gets XML as a result; the
parsing is the job of the ixml processor.

Different people have different expectations, but I suspect that for one
class of xml-dev readers (namely those whose programming tools of
preference are XSLT and XQuery), having the data in XML is the
pre-requisite for writing code to process the data in question, not an
alternative to getting the data into XML

If on the other hand by "create a parser" you mean "write a grammar",
then I think you might be well advised to pay more attention to the
difference between grammars and parsers, since they are not the same
thing at all.


TL;DR Why, you ask, is the Invisible XML group trying to re-invent the
wheel?  Because we think things will work better with the axle in the
center of the wheel.


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Follow-Ups:
- Re: [xml-dev] Invisible XML 1.0 spec has been published
  - From: John Cowan <johnwcowan@gmail.com>
References:
- RE: [xml-dev] Invisible XML 1.0 spec has been published
  - From: Roger L Costello <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]