xml-dev - Re: [xml-dev] XML Hangover

Re: [xml-dev] XML Hangover
[ Lists Home | Date Index | Thread Index ]
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] XML Hangover
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
Date: Mon, 11 Jul 2005 16:00:49 -0400
In-reply-to: <514a17f50507111142789929a3@mail.gmail.com>
References: <514a17f50507111142789929a3@mail.gmail.com>
At 2005-07-11 21:42 +0300, Joe Schaffner wrote:
>XSL - XML Stylesheets is divided into two parts, XSL-T and XSL-FO.
>
>The T part deals with templates and translation. Since HTML is valid XML,

(it isn't)

>I guess I can parse my HTML using XSL-T to produce XML and vice versa.

No, XSLT will only take in XML, not HTML.  Since XHTML is XML then you can 
use XHTML as an input, but XSLT 1.0 does not accept SGML as input.

>I don't understand why XSL-T refers to "nodes in an output tree".

Because not all needs for transformation require a concrete representation 
of the result in angle brackets.

>This suggests some kind of internal representation, but XML is perfectly 
>good representation language.

A perfectly good concrete representation, yes, but if the downstream 
process can act on the abstract representation of a node tree, why bother 
with a concrete representation of angle brackets?

>Don't <templates> merely write XML text to stdout?

Not at all ... templates express the nodes that are added to the abstract 
result tree.  If you want to serialize the abstract result tree into 
concrete angle brackets you get XML, or you can request SGML angle brackets 
assuming the HTML vocabulary (if it is supported by the processor), or you 
can request simple text output without any markup (again if it is supported 
by the processor).

If you don't need a concrete representation, and the downstream process 
(say, for example, an XSL-FO formatter) can work with the output abstract 
node tree as an input abstract node tree (quelle surprise!) and act on the 
transformed information with nary an angle bracket in sight.

>Roughly, the process seems to work like this: the T processor does a 
>recursive descent of the source XML.

Not necessarily ... it is up to the stylesheet to decide how to descend 
into the tree.  If the stylesheet doesn't say anything about a given 
element, however, the built-in processing is in fact a depth-first 
breadth-next descent.

>At each node it evaluates the set of templates. Those templates which 
>match the name of the "current" tag are processed, in some order.

False ... exactly one template matches a given input node based on the 
XPath data model.  The data model models elements, attributes, processing 
instructions, comments, text and namespaces ... it does not model tags.

If the stylesheet should happen to present two templates that both happen 
to match a given input node of any kind, that is an error 
condition.  Facilities and techniques are available in the language to 
specialize two templates as being distinct, but however much attempt has 
been made, it is still a "template conflict error" should the stylesheet 
present two templates for one node.

>The template writes text, that's why it's called a "template.

False.  The template expresses those nodes that are to be added to the 
result tree at the given point of result tree construction.  It is a 
template of the result tree fragment.  It doesn't "write text".

>The recursive descent is continued with an <apply-templates> tag inside 
>the template. This allows you to balance output.

Not sure what you mean by the verb "balance" here ... the stylesheet writer 
is obliged to express the transform such that the result tree be 
constructed in result parse order in a single pass.  Note the "single pass" 
business:  once you have constructed part of the result tree there is no 
"going back" to massage your result tree ... you only have one chance to 
write out each output node (with the exception of the current set of 
attributes for an element whose content has not yet begun).

>If no matches are found, the T processor continues the descent.

If there are no stylesheet templates to match a given node, there are 
built-in templates to match any given node, and for elements the descent is 
continued.

>There is a <template> tag (I forget what) which will select arbitrary 
>paths in the souce tree, and there are tags which iterate through the 
>result. This will allow me to build up a result "tree" which is not a 
>mirror image of the source, something I need to do if I'm rearranging 
>sections of the input document. Rather than buffering intermediate 
>structures, the T processor does multiple passes based on these tags, and 
>creates the output on-the-fly. Cool.

Templates don't select arbitrary paths, the stylesheet requests to push and 
pull nodes address nodes using arbitrary address expressions into the XPath 
data model used for XML documents.

>Then there is the DOM - Document Object Model...

Which is a different document model for XML documents than the XPath data 
model is for XML documents.  For example, the DOM models CDATA sections 
while XPath does not.

>but my XML already defines the document's object model. It's whatever I 
>decide it to be. Why the confusion? Are you referring to parsing an XML 
>document using some kind of programming environment to produce a tree in 
>some kind of intermediate representation? Are there C libs/structs or Java 
>classes or Perl modules which tear apart an XML document?

Maybe in somebody's program, but not in XSLT.  I'm getting confused by your 
reference to an "object model" ... XML defines a syntax for structured 
documents, XPath defines a data model for the information expressed in XML 
syntax, DOM defines a different data model for the information expressed in 
XML syntax.

>This would allow someone to build an XML parser into any configuration 
>program.

Not sure what you mean by that.

>The FO of XSL parts looks alot like Knuth's TeX, a graphical document 
>description language, set in XML, incredibly complex.

If you find it complex, then perhaps you haven't looked at it closely 
enough.  It is a set of pagination semantics for both flow and formatting, 
expressed in an XML vocabulary of elements and attributes.  It is well 
designed and easy to teach and easy to use (I admit the W3C spec is not 
that easy to read ... it was written more for XSL-FO engine implementers 
than for stylesheet writers).

>I guess there are FO display tools out there (called "formatters") that 
>interpret FO.

There are many, both commercial and free, both expensive and affordable.

>[Postscript was a procedural language for graphical page description. PDF 
>is a data definition language which takes the procedures out of 
>Postscript. FO is a data definiton language which describes the appearance 
>of an XML document as richly as these other languages.]

It is more abstract ... XSL-FO does not say anything about rendering, only 
about formatting ... deciding what goes where on a page, not how what goes 
where is expressed in a page description language.  Many tools offer PDF 
and some tools offer PostScript and other tools offer other output expressions.

>[Does anybody remember what a dvi representation did? I think it stood for 
>"device indepent" but TeX was already device independent.]
>
>I had a ghostscript viewer which seemed to understand dvi files, and 
>postscript too. It was quite clumsy.
>
>I assume there is nothing stopping me from using XSL-T to transform my 
>HTML to PDF,

First, as mentioned above, HTML isn't an input to XSLT ... but even if you 
input XHTML putting out PDF by XSLT would seem to me to be quite awkward.

>but it seems best to output XSL-FO then create a PDF using some kind of tool.

Absolutely.

>  What is that tool?

See http://www.xmlsoftware.com/xslfo.html for a starting point.

>Are there FO plug-ins available for my browsers?

Doubtful ... since browsers output in arbitrary and changeable window sizes 
and XSL-FO pagination is geared to fixed-sized-folios (pages) of 
information, I haven't yet seen a browser-based implementation of 
XSL-FO.  Note that there is, in fact, a way in the XSL-FO specification to 
turn off pagination and be supported in a dynamically-sized window ... but 
I haven't seen that implemented.

>Does this technology work?

Absolutely!  I have customers producing millions of pages of customized 
printed output based on XML inputs exported from databases of 
information.  It is very professional and it gives my customers a technical 
edge ahead of their competition.  I author all my training and book 
materials in XML and use XSLT and XSL-FO to go to camera-ready output.  We 
claim that my "Definitive XSLT and XPath" book published by Prentice-Hall 
is the first bookstore-shelf-published-book that was produced end-to-end in 
XML because we wrote XSL-FO stylesheets to lay out the pages and supplied 
PDF files to PH for publication with the final camera-ready output.  We 
wrote an annex in the book describing our process.  My "Definitive XSL-FO" 
book then just re-used the stylesheets and that production process was much 
quicker.  I use different stylesheets to publish the same content as PDF 
books for sale on my web site.  I use those also to publish my student 
handouts for my hands-on courses.  All this from the same set of XML inputs.

I hope this helps.

. . . . . . . Ken

--
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
Follow-Ups:
- Re: [xml-dev] XML Hangover
  - From: Joe Schaffner <schaffner.joe@gmail.com>
References:
- XML Hangover
  - From: Joe Schaffner <schaffner.joe@gmail.com>
Prev by Date: Re: [xml-dev] XML Hangover
Next by Date: RE: [xml-dev] XML Hangover
Previous by thread: RE: [xml-dev] XML Hangover
Next by thread: Re: [xml-dev] XML Hangover
Index(es):
- Date
- Thread