xml-dev - Re: [xml-dev] Improving XML desing?

Re: [xml-dev] Improving XML desing?
[ Lists Home | Date Index | Thread Index ]
To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Improving XML desing?
From: <juanrgonzaleza@canonicalscience.com>
Date: Tue, 9 May 2006 08:45:26 -0700 (PDT)
Cc: <d09@hush.ai>
Importance: Normal
In-reply-to: <10F7AAAF-C975-43BE-8A15-73A2266CE806@hush.ai>
References: <10F7AAAF-C975-43BE-8A15-73A2266CE806@hush.ai>
Chris Burdess said:

My interest in this list is for discussing if some of ideas developed from
the CanonML program can be implemented in a future XML specification.

I am not interested in debate about weakness or strengths of the CanonML
approach here. If anyone is interested, any comment on improvement of
CanonML would be directly attached to the corresponding Canonical Science
Today and then debated there.

I will not reply any other message as this from Chris Burdess here. Since
this message was already published in the list before remarks above, I
will reply to this one.

> <juanrgonzaleza@canonicalscience.com> wrote:
>> In
>>
>> [http://canonicalscience.blogspot.com/2006/04/canonml-markup-
>> language-beyond-tex-xml.html]
>>
>> i presented some basic thoughts on CanonML, a markup language
>> beyond TeX,
>> XML, and alternatives as liminal, GODDAG, SGML concur, and others.
>> The new
>> approach is clearly inspired in SXML.
>
> Many people have already commented that s-expressions are more
> concise than XML. The standard rebuttal is that with deeply embedded
> tree structures it is virtually impossible to visually locate the end
> tag and therefore know which element ends where.

I addressed this topic with care and even provided several examples in
Canonical Science Today of why readability of XML end tags is in most of
cases a myth.

> Therefore, for
> trivial examples s-expressions are more readable, whereas for more
> complex examples XML is much more readable.

Already replied.

> CanonML appears to be yet
> another s-expression language, with confusing additional syntax that
> appears to serve no purpose (such as double colons), redundant
> duplication of forms ([::x] versus \x). It introduces not only the
> above problem but also

Incorrect. It is not based in s-expressions. The reasons for introducing
explicit tag and advantages over s-expression based markups were
discussed.

The relationship between [::x] and \x is like <tag></tag> vs <tag/> but
with additional advantages. For instance, it is trivial to write a parser
understanding also CanonML empty tags, whereas is not so trivial in XML.
This is the reason of the elimination of empty tag syntax in some
approaches as simple or minimal XML.

Still, people worried about empty tags (trying to write a still simpler
parser) could develop a minimal CanonML subset without empty tags
notation. I already wrote about that.

>
> - confusion over whether whitespace is part of the text content or   the
> markup (a step backward from SXML)

False. It is the XML WS model is very ugly. One of authors of MSXML
recognized that WS is one of the “perpetual nightmares” of XML.
Difficulties addressed by him (extracted from real-life XML code) are
avoided in this approach. Moreover difficulties arising in mixed content
with SXML are not present here. WS model is also improved over TeX-like
approaches. Since WS is an important point I thought talk about the
CanonML model in a separate posting in a future Canonical Science Today.

> - lack of processing instruction mechanism

False.

> - lack of comment mechanism

False, and as in liminal there is not restrictions to usage of "--"
prohibited in XML.

> - lack of character encoding mechanism

False. Moreover, I work in Unicode by default.

> - lack of reusable entity mechanism

False. Moreover the specific module CanonFormal contains more symbols that
are listed in the predefined MathML entities. There is others improvements
are not achieved even if list of MathML entities is enlarged.

> - lack of mechanism for modularity of documents (within different
> resources)

I do not understand what you exactly mean.

> - lack of differentiation between unordered and ordered properties
> (XML attributes vs. child elements)

Irrelevant for practical applications.

What is the advantage of lack of differentiation between <T a="1" b="2"/>
and <T b="2" a="1"/> in XML?

However, lack of ordered metadata is a real problem in real usage.
Attribute model in CanonML was improved beyond ConciseXML (which is
already a generalization of XML 1.0-1.1).

>
>> Some of technical points of CanonML are beyond XML (i mean if one wait
>> backward compatibility with XML 1.0-1.1), but other points could be
>> considered for discussion on a future XML specification.
>
> Not with your current proposal. The only advantage I can see to your

Interestingly, Paul T -who know near 100 alternatives to XML- defined this
approach as

"Strong (the only?) attempt on one markup for several xml processing specs
design"

> language is that it may be slightly more readable for some small   class
> of trivial documents. Now let's consider a more reasonable   example:
>
> <math xmlns='http://www.w3.org/1998/Math/MathML'>
>    <mrow>
>      <mo fence='true'>&#x2225;</mo>
>      <mrow>
>        <msup>
>          <mi mathvariant='bold'>v</mi>
>          <mi>L</mi>
>        </msup>
>        <mo>-</mo>
>        <msubsup>
>          <mi mathvariant='bold'>w</mi>
>          <msup>
>            <mi>c</mi>
>            <mi>L</mi>
>          </msup>
>          <mi>L</mi>
>        </msubsup>
>      </mrow>
>      <mo fence='true'>&#x2225;</mo>
>      <mo>&le;</mo>
>      <mo fence='true'>&#x2225;</mo>
>      <mrow>
>        <msup>
>          <mi mathvariant='bold'>v</mi>
>          <mi>L</mi>
>        </msup>
>        <mo>-</mo>
>        <msubsup>
>          <mi mathvariant='bold'>w</mi>
>          <mi>i</mi>
>          <mi>L</mi>
>        </msubsup>
>      </mrow>
>      <mo fence='true'>&#x2225;</mo>
>      <mo>&forall;</mo>
>      <mi>i</mi>
>    </mrow>
> </math>
>
> Let's try this in CanonML:
>
> [::math [@ [::xmlns http://www.w3.org/1998/Math/MathML]] [::mrow   [::mo
> [@ [::fence true]] âˆ¥] [::mrow [::msup [::mi [@ [::mathvariant   bold]]
> v] [::mi L]] [::mo -] [::msubsup [::mi [@ [::mathvariant
> bold]] w] [::msup [::mi c] [::mi L]] [:mi L]]] [::mo [@ [::fence
> true]] âˆ¥] [::mo â‰¤] [::mo [@ [::fence true]] âˆ¥] [::mrow [::msup
> [::mi [@ [::mathvariant bold]] v] [::mi L]] [:mo -] [:msubsup [::mi   [@
> [:mathvariant bold]] w] [::mi i] [::mi L]]] [::mo [@ [::fence   true]]
> âˆ¥] [::mo âˆ€] [::mi i]]]
>
> Shorter, yes. But more readable? I know which one I'd prefer to debug
> the nesting of (and edit using vi over an ssh connection). Where does
> the second mrow element end, for instance? The third?

Apples and oranges?

Let me ignore now that your above example is not CanonML way to encode
MathML code (you are copying the SXML attributes list!). Let me ignore the
problems of XML end tags on data storage verbosity and dynamical execution
issues i addressed. Let me ignore also that there exists a specific module
for math improving MathML. Let me ignore others advantages of CanonML over
XML. Let me also ignore the difficulties with MathML predefined entities
(e.g. we discussed certain difficulties about &DifferentialD; in w3c math
mailing list this year and final advice from several w3c folks was to not
use it).

The MathML code without extra WS is

<math xmlns='http://www.w3.org/1998/Math/MathML'><mrow>
<mo fence='true'>&#x2225;</mo><mrow><msup>
<mi mathvariant='bold'>v</mi><mi>L</mi></msup><mo>-</mo>
<msubsup><mi mathvariant='bold'>w</mi><msup><mi>c</mi><mi>L</mi>
</msup><mi>L</mi></msubsup></mrow>
<mo fence='true'>&#x2225;</mo><mo>&le;</mo>
<mo fence='true'>&#x2225;</mo><mrow><msup>
<mi mathvariant='bold'>v</mi><mi>L</mi></msup><mo>-</mo><msubsup>
<mi mathvariant='bold'>w</mi><mi>i<mi/>L</mi></msubsup>
</mrow><mo fence='true'>&#x2225;</mo><mo>&forall;</mo>
<mi>i</mi></mrow></math>

whereas "your CanonML" code re-typed with extra WS (there is different
ways to improve readability) looks like

[::math [@ [::xmlns http://www.w3.org/1998/Math/MathML]]
  [::mrow
    [::mo [@ [::fence true]] &#x2225;]
    [::mrow
      [::msup
        [::mi [@ [::mathvariant bold]] v]
        [::mi L]
      ]
      [::mo -]
      [::msubsup
        [::mi [@ [::mathvariant bold]] w]
        [::msup
          [::mi c]
          [::mi L]
        ]
        [:mi L]
      ]
    ]
    [::mo [@ [::fence true]] &#x2225;]
    [::mo &le;]
    [::mo [@ [::fence true]] &#x2225;]
    [::mrow
      [::msup
        [::mi [@ [::mathvariant bold]] v]
        [::mi L]
      ]
      [:mo -]
      [:msubsup
        [::mi [@ [:mathvariant bold]] w]
        [::mi i]
        [::mi L]
      ]
    ]
    [::mo [@ [::fence true]] &#x2225;]
    [::mo &forall;]
    [::mi i]
  ]
]

Nice? Maybe. More readable than XML? Sure! Where does the second mrow
element end? The third? Easy! (even *without* a syntax highlighting or
good editor) Ask if you do not know the hint. Not so easy in XML, because
difference between open and end tags is only of one character in 6-7 (this
was one of improvements of liminal over XML in its design of closed tags).

However, I did an error whereas copying above XML fragment without WS. Can
you find it in the MathML code?

P.S: the fragment is XML valid, therefore a XML parser do not help you.

> --
> çŠ¬ Chris Burdess
>    "They that can give up essential liberty to obtain a little safety
> deserve neither liberty nor safety." - Benjamin Franklin

Juan R.

Center for CANONICAL |SCIENCE)
Follow-Ups:
- RE: [xml-dev] Improving XML desing?
  - From: "Michael Kay" <mike@saxonica.com>
References:
- Re: [xml-dev] Improving XML desing?
  - From: Chris Burdess <d09@hush.ai>
Prev by Date: [ANN] CodeSynthesis XSD 2.1.1 - open-source XML Schema to C++ translator
Next by Date: RE: [xml-dev] Improving XML desing?
Previous by thread: Re: [xml-dev] Improving XML desing?
Next by thread: RE: [xml-dev] Improving XML desing?
Index(es):
- Date
- Thread