Re: [xml-dev] Re: LPDs

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Tony Graham <tgraham@antenna.co.jp>
To: xml-dev@lists.xml.org
Date: Fri, 13 Aug 2021 11:31:07 +0100

On 05/08/2021 15:43, Marcus Reichardt wrote:
...

To clarify, an SGML parser merely passes all applicable link rules
in a given context to "the application" which is then expected to
select a matching one in case there's more than a single. SGML
merely verifies weak uniqueness of link rule declarations (namely
that if more than a single link rule applies to a given element in a
given link set, all of these link rules must have link attributes).

Along the same lines, SGML under no circumstances can re-parse
content based on feedback of later LPD pipelining stages, nor does it
need to.

You quoted to me:

for example, a formatter might process part of a document to generate
galleys, then process part of the generated galleys, then go back to
producing galleys again.  An application could even use information
gained during the page production process to re-do some of the
earlier galley processing

...

If you have recursive sections, a la DocBook's <section> [1], then
you need myriad link rules to have the correct attributes added to
each level of section title.

SGML has the RANK feature assigning a level suffix like h1, h2,
etc.,

We tried RANK.  It was.

allowing you to trivially target h1, h2 as source elements in link rules. Though it doesn't quite work as expected by most authors (doesn't assign a rank suffix automatically by nesting level), my

I don't recall what our issue was, but we only tried it in one project.

point with respect to this being a vocabulary issue thus stands: if you deliberately choose not to use a feature, then *of course* you must make up for it by inventing your own conventions, bringing its idiosyncrasies along.

If we tried to use a feature and it made things worse, then of course we
stopped using it.

In the case of Docbook, dbhierx.mod went with allowing both unranked <section> elements and unranked <sect1>, <sect2>, ... (declared as individual elements) at the same time. Now I also have contributed papers where it sure was helpful to merely include/concatenate articles to produce conference proceedings irrespective of levels,
but then you have this gem in the docbook DTD:

<!ATTLIST sect1 renderas (sect2 |sect3 |sect4 |sect5)
#IMPLIED ...

pointing to the fact that the problem of adequately representing hierarchy levels in documents has nuances and issues that aren't
quite solved. This is also supported by HTML5's failed attempt at introducing section roots with unranked <header> elements (the withdrawn HTML5 outlining algorithm for screen readers), and rather idiosyncratic interpretation of h1-h6 levels.

I'm sorry to have touched a nerve.  You brought up <sect1>, etc., but we
only ever used 'unranked' sections, and we would not have had a
'renderas' attribute at all.

Now of course the theoretical existence of SGML RANK doesn't help if customers bring Docbook content for printing, but it highlights a

We did end up formatting DocBook manuals in multiple languages for a
client, but the failed experiments with LPDs and RANK were with
home-grown, document-specific DTDs.

subtle difference: that the XML world is somewhat dominated by committee-designed vocabularies with a natural tendency to overreach and eventually represent each and everything under the sun, whereas
in traditional SGML, you'd design a DTD for *your* text document
project and authoring practices at hand.

We did, which relates to why RANK didn't suit us.

To come back to namespaces, the original topic of this subthread :) The deeper issue with namespaces is exactly that: there's no point
in syntactically composing content from different "namespaces" when
what you need is to map concepts from your doc into the concepts of
eg an output vocabulary anyway. What interpretation should have
<foo:bar> next to a <p> element, say, anyway? It's just that in SGML,
the concept of namespace stands out as particular bureaucratic and pointless, in the presence of a mechanism to map your elements into that of a target vocabulary.

Yet somehow we all ended up using CALS tables, so much so that SGML Open
(as it was then) defined an interoperable subset.  DocBook ended up
supporting both CALS <table> elements and HTML <table> elements.

I do wonder why I react so badly to LPDs after all these years
while I don't have a problem with an <?xml-stylesheet?> PI in a
document.
XSLT is a Turing-complete language. That XSLT can perform arbitrary transformations doesn't contribute to a discussion about the adequateness of a document representation language, just as
discussing

I think it shows that LPDs are inadequate in comparison because LPDs
can't generate different links for all the different contexts that you
put in a 'match' attribute in XSLT.  If all that XSLT could do was add
attributes, it would still be able express far more than LPDs.

If you had a RELAX NG processor that could regenerate its source
document with added attributes to indicate the pattern that validated an
element (or attribute), that would be more expressive than LPDs.

any other Turing-complete language wouldn't. All the PI does is hide away the difficult parts in a pseudo-declarative way, with XSLT considered part of the "XML stack" to make it appear homogeneous, similar to how databases use special database languages.

I could go further and ask if a special "output" or "foreign" format *as a markup vocabulary* is even needed, and would XSLT as a
language be helpful in performing characteristic tasks in such a
setting if it were? SGML and XML, after all, are about digital *text*
when SVG, FOP,

I am not sure what point you are trying to make.

SGML had multiple transformation tools, both free and commercial.

The conceptual model for XSL, from XSL 1.0 [4], includes:

   An XSL stylesheet processor accepts a document or data in XML and
   an XSL stylesheet and produces the presentation of that XML source
   content that was intended by the designer of that stylesheet.

and:

   Tree transformation allows the structure of the result tree to
   be significantly different from the structure of the source tree.

and:

   Formatting is enabled by including formatting semantics in the
   result tree... In XSL, the classes of formatting objects and
   formatting properties provide the vocabulary for expressing
   presentation intent.

No mention of an XML representation.  You can blame Microsoft for
starting the XSLT processor trend.

Under 'Transform to Another Vocabulary', there's:

   In some implementations of XSL/XSLT, the result of tree
   construction can be output as an XML document. This would allow
   an XML document which contains formatting objects and formatting
   properties to be output. This capability is neither necessary
   for an XSL processor nor is it encouraged.

Still no mention of XSL-FO as input.

When XSL 1.0 needed a testsuite to demonstrate interoperability so it
could go from Candidate Recommendation to Proposed Recommendation, IIRC
there was a long wait while people wrestled with the idea of allowing
XSL-FO XML files in the test suite because that wasn't how XSL was
supposed to work.

area trees, etc. all about representing non-text (component models)
in markup. But, in contrast to XSL's predecessor DSSSL, no sane
person would actually use XSL to render/position flow layouts (HTML +
CSS core into boxes) to another markup language, let alone without
having

DSSSL defined a transformation stage.

Jade's non-standard SGML backend was turning into the main use of Jade.
Back in that day, I wrote Scheme functions with function names
corresponding to HTML element names that had parameters corresponding to
their attributes so that you had something like XSLT but with
parentheses and Scheme syntax.

access to font metrics, and with extremely painful ways to do the calculations implemented in TeX or troff at disposal. Instead, XML
has a magic FOP processor able to do something that can't be
expressed in

XSL-FO, not FOP.  FOP is an XSL formatter, but it is not the only one.

XSL itself, similar to a mythical "SGML application". Actually, I
may be one of the few persons to have ever attempted something like
a layout process using XSL 2, albeit in a mild setting for finding optimal shortenings/omissions of names and postal addresses (such as Manufacturing - > Mfct) to fit output medium and external interface constraints, considering fixed-width fonts only.

Last week, at Balisage, David Birnbaum and Charlie Taylor from
University of Pittsburgh gave their talk 'How long is my SVG <text>
element?' [1].  They showed using XSLT to generate/modify SVG when using
an XML representation of their font metrics.

I've noticed somewhat of a defensive gut reaction by XML heads whenever SGML comes up. While I use XML all the time, and plan to continue doing so, let me bluntly say that XML's only raison d'etre, by the XML spec's own wording (as in chapter 1, sentence 1), was to replace SGML as a markup meta-language on the web. Likewise, the
only

Well, no.  Its goal was for generic SGML to work as well as HTML.  The
second sentence of the Abstract is:

   Its goal is to enable generic SGML to be served, received,
   and processed on the Web in the way that is now possible
   with HTML.

HTML 4.01 described itself as "an SGML application conforming to
International Standard ISO 8879 -- Standard Generalized Markup
Language". [2]  SGML was used to provide a formal definition of HTML
[3], but there's no mention of other DTDs being possible (on or off the
Web), and the section on validation includes:

   Beware that such validation, although useful and highly
   recommended, does not guarantee that a document fully
   conforms to the HTML 4 specification.

The idea of 'SGML as a markup meta-language on the web' just didn't
register with non-SGML people at that time.  XML couldn't replace SGML
as a meta-markup language on the Web because SGML wasn't being used as a
meta-markup language on the Web in any significant volume at that point
anyway.

I'm not here to discuss whether or not XML was meant to replace SGML as
a markup meta-language on the web.  My only purpose in joining this
conversation was to counter the notion that LPDs were a good idea.

purpose of xsl-stylesheet PIs is to supply an HTML rendering for non-XHTML/non-HTML XML content in browsers (and I've actually

Not just in browsers.  With AH Formatter, the PI can specify an XSLT
stylesheet that generates XSL-FO.

implemented web sites that way, albeit not in this epoch). Its existence, in combination with XSLT being Turing-complete, can't be used as serious argument in discussing the merits and features of document languages. As to namespaces, their original purpose was to avoid name collisions in anticipation of a wealth of new
vocabularies on the web.

Along the lines of what I said about XML vocabularies above,
criticism tends to come from a perspective of an entrenched industry
wanting to define vocabularies/namespaces, sell XML tools, or
similar. To be sure, there's nothing wrong with W3C creating a
cottage industry of XML-centric tools, for XML publishing, WS-*, RDF,
and whatnot. Just saying that none of that stuff has anything to do
with the web (the W in W3C), and has failed miserably on the web.
Worse, W3C has kindof

RDF has nothing to do with the Web?

held back HTML evolution while they were busy with XML, such that
CSS had to become absurdly overpowered to cater for HTML's
shortcomings, HTML being a rather simplistic vocabulary based on
early common text markup tagging folklore plus hypertext anchors for
casual academic publishing, with generic divs and spans used for
nearly everything else. There was even the false
structure-vs-presentation dichotomy created after the fact to justify
a new *syntax* for rendering properties (eg CSS). In which universe
does it make sense to use

<p style="color: red">

rather than

<p color=red>?

In what world does it make sense to put the final formatting in the
source document to begin with?

And we're already suffering through this proliferation of
microsyntax and idiosyncrasies creating browser monopolies and
putting web authoring out of reach of even graphic professionals let
alone layman. (Btw your XSL predicate "@rend eq 'all-small-caps'"
looks suspiciously similar to the ad-hoc selector syntax in the SGML
handbook ;) Just so

Yes.  You can find proto-versions of half a dozen things in the examples
in the SGML Handbook if you are inclined to look for them, but the
standard LPD mechanism didn't do enough, and all of the other syntaxes
were non-standard and unimplemented.

we're on the same page, if W3C's CSS WG is having their way, we'll shortly see gems such as

div:has(p) { ... }

(plus numerous further pseudo-selectors cf <https://css4-selectors.com/selectors/>) pointing to a similar (self inflicted) problem that Docbook has with encouraging generic "<section>" elements. Thereby only introducing increased complexity

I think that our perspectives about "<section>" differ.

*when the author has complete control over the text document anyway* and could well be bothered to use specific elements to encode that which (s)he wants to express, in a markup language that excels in exactly these kind of applications of all things.

Regards,

Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.
----
Skerries, Ireland
tgraham@antenna.co.jp

[1] https://www.balisage.net/2021/Program.html#F14
[2] https://www.w3.org/TR/html401/
[3] https://www.w3.org/TR/html401/sgml/intro.html#h-19.1
[4] https://www.w3.org/TR/2001/REC-xsl-20011015/slice1.html#section-N639-Processing-a-Stylesheet

References:
- Re: [xml-dev] Re: LPDs
  - From: Marcus Reichardt <u123724@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]