Re: [xml-dev] Re: LPDs

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Marcus Reichardt <u123724@gmail.com>
To: xml-dev@lists.xml.org
Date: Thu, 5 Aug 2021 16:43:19 +0200
Thanks for your detailed response.

> The SGML Handbook invents a microsyntax for evaluating attribute
> values, but the LPD mechanism as defined can't do the evaluation for you.
> However, the 'application' is allowed to evaluate added attributes
> before they're added to influence which set of added attributes gets
> added to the result.  Let's just say that's beyond my experience of
> using an SGML parser.

To clarify, an SGML parser merely passes all applicable link rules in
a given context to "the application" which is then expected to select
a matching one in case there's more than a single. SGML merely
verifies weak uniqueness of link rule declarations (namely that if
more than a single link rule applies to a given element in a given
link set, all of these link rules must have link attributes).

Along the same lines, SGML under no circumstances can re-parse content
based on feedback of later LPD pipelining stages, nor does it need to.

> I don't want to think about how you'd write LPDs such that <foreign> at
> any level in a table generates a different attribute value (or no
> attribute at all).  You might be able to do it with #USELINK and a web
> of entities to specify the parts that you want to keep, but it wouldn't
> be nearly as succinct as "foreign[empty(ancestor::table)]".

> If you have recursive sections, a la DocBook's <section> [1], then you
> need myriad link rules to have the correct attributes added to each
> level of section title.

SGML has the RANK feature assigning a level suffix like h1, h2, etc.,
allowing you to trivially target h1, h2 as source elements in link
rules. Though it doesn't quite work as expected by most authors
(doesn't assign a rank suffix automatically by nesting level), my
point with respect to this being a vocabulary issue thus stands: if
you deliberately choose not to use a feature, then *of course* you
must make up for it by inventing your own conventions, bringing its
idiosyncrasies along.

In the case of Docbook, dbhierx.mod went with allowing both unranked
<section> elements and unranked <sect1>, <sect2>, ... (declared as
individual elements) at the same time. Now I also have contributed
papers where it sure was helpful to merely include/concatenate
articles to produce conference proceedings irrespective of levels, but
then you have this gem in the docbook DTD:

    <!ATTLIST sect1
                renderas        (sect2
                                      |sect3
                                      |sect4
                                      |sect5)         #IMPLIED
    ...

pointing to the fact that the problem of adequately representing
hierarchy levels in documents has nuances and issues that aren't quite
solved. This is also supported by HTML5's failed attempt at
introducing section roots with unranked <header> elements (the
withdrawn HTML5 outlining algorithm for screen readers), and rather
idiosyncratic interpretation of h1-h6 levels.

Now of course the theoretical existence of SGML RANK doesn't help if
customers bring Docbook content for printing, but it highlights a
subtle difference: that the XML world is somewhat dominated by
committee-designed vocabularies with a natural tendency to overreach
and eventually represent each and everything under the sun, whereas in
traditional SGML, you'd design a DTD for *your* text document project
and authoring practices at hand.

To come back to namespaces, the original topic of this subthread :)
The deeper issue with namespaces is exactly that: there's no point in
syntactically composing content from different "namespaces" when what
you need is to map concepts from your doc into the concepts of eg an
output vocabulary anyway. What interpretation should have <foo:bar>
next to a <p> element, say, anyway? It's just that in SGML, the
concept of namespace stands out as particular bureaucratic and
pointless, in the presence of a mechanism to map your elements into
that of a target vocabulary.

>  I do wonder why I react so badly to LPDs after all these years while I
> don't have a problem with an <?xml-stylesheet?> PI in a document.

XSLT is a Turing-complete language. That XSLT can perform arbitrary
transformations doesn't contribute to a discussion about the
adequateness of a document representation language, just as discussing
any other Turing-complete language wouldn't. All the PI does is hide
away the difficult parts in a pseudo-declarative way, with XSLT
considered part of the "XML stack" to make it appear homogeneous,
similar to how databases use special database languages.

I could go further and ask if a special "output" or "foreign" format
*as a markup vocabulary* is even needed, and would XSLT as a language
be helpful in performing characteristic tasks in such a setting if it
were? SGML and XML, after all, are about digital *text* when SVG, FOP,
area trees, etc. all about representing non-text (component models) in
markup. But, in contrast to XSL's predecessor DSSSL, no sane person
would actually use XSL to render/position flow layouts (HTML + CSS
core into boxes) to another markup language, let alone without having
access to font metrics, and with extremely painful ways to do the
calculations implemented in TeX or troff at disposal. Instead, XML has
a magic FOP processor able to do something that can't be expressed in
XSL itself, similar to a mythical "SGML application". Actually, I may
be one of the few persons to have ever attempted something like a
layout process using XSL 2, albeit in a mild setting for finding
optimal shortenings/omissions of names and postal addresses (such as
Manufacturing - > Mfct) to fit output medium and external interface
constraints, considering fixed-width fonts only.

I've noticed somewhat of a defensive gut reaction by XML heads
whenever SGML comes up. While I use XML all the time, and plan to
continue doing so, let me bluntly say that XML's only raison d'etre,
by the XML spec's own wording (as in chapter 1, sentence 1), was to
replace SGML as a markup meta-language on the web. Likewise, the only
purpose of xsl-stylesheet PIs is to supply an HTML rendering for
non-XHTML/non-HTML XML content in browsers (and I've actually
implemented web sites that way, albeit not in this epoch). Its
existence, in combination with XSLT being Turing-complete, can't be
used as serious argument in discussing the merits and features of
document languages. As to namespaces, their original purpose was to
avoid name collisions in anticipation of a wealth of new vocabularies
on the web.

Along the lines of what I said about XML vocabularies above, criticism
tends to come from a perspective of an entrenched industry wanting to
define vocabularies/namespaces, sell XML tools, or similar. To be
sure, there's nothing wrong with W3C creating a cottage industry of
XML-centric tools, for XML publishing, WS-*, RDF, and whatnot. Just
saying that none of that stuff has anything to do with the web (the W
in W3C), and has failed miserably on the web. Worse, W3C has kindof
held back HTML evolution while they were busy with XML, such that CSS
had to become absurdly overpowered to cater for HTML's shortcomings,
HTML being a rather simplistic vocabulary based on early common text
markup tagging folklore plus hypertext anchors for casual academic
publishing, with generic divs and spans used for nearly everything
else. There was even the false structure-vs-presentation dichotomy
created after the fact to justify a new *syntax* for rendering
properties (eg CSS). In which universe does it make sense to use

    <p style="color: red">

rather than

    <p color=red>?

And we're already suffering through this proliferation of microsyntax
and idiosyncrasies creating browser monopolies and putting web
authoring out of reach of even graphic professionals let alone layman.
(Btw your XSL predicate "@rend eq 'all-small-caps'" looks suspiciously
similar to the ad-hoc selector syntax in the SGML handbook ;) Just so
we're on the same page, if W3C's CSS WG is having their way, we'll
shortly see gems such as

   div:has(p) {
      ...
   }

(plus numerous further pseudo-selectors cf
<https://css4-selectors.com/selectors/>) pointing to a similar (self
inflicted) problem that Docbook has with encouraging generic
"<section>" elements. Thereby only introducing increased complexity
*when the author has complete control over the text document anyway*
and could well be bothered to use specific elements to encode that
which (s)he wants to express, in a markup language that excels in
exactly these kind of applications of all things.
Follow-Ups:
- Re: [xml-dev] Re: LPDs
  - From: Tony Graham <tgraham@antenna.co.jp>
- Re: [xml-dev] Re: LPDs
  - From: Dave Pawson <dave.pawson@gmail.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]