I've been reading the XML litterature. It's great. Just a few
comments:
Welcome on board. It's refreshing to get thoughtful
comments from someone who's
new to the game.
XSL - XML Stylesheets is divided into two parts, XSL-T and XSL-FO.
The T part deals with templates and translation. Since HTML is valid XML,
I guess I can parse my HTML using XSL-T to produce XML and vice
versa. I don't understand why XSL-T refers to "nodes in an output tree".
This suggests some kind of internal representation, but XML is perfectly good
representation language. Don't <templates> merely write XML text to
stdout?
No,
the result tree is completely abstract, there is no suggestion of an
internal representation. In fact, for many XSLT processors, the "result tree"
is represented internally as a stream of events, not as a linked
collection of objects in memory. This concept of writing a tree, rather than
writing text, however is extremely important. Firstly, it defines a
separation of the information content of an XML document from the accidental
aspects of its lexical representation - something that is sadly missing from
the XML spec itself. In turn, this gives you a basis for defining a
concise set of operators that are in some sense complete, composable and
exhibit closure. In practical terms, it gives you the ability to write a
series of transformations - a pipeline - in which the expensive steps of
serializing and parsing intermediate results can be
eliminated.
Roughly, the process seems to work like this: the T processor
does a recursive descent of the source XML. At each node it evaluates the set
of templates. Those templates which match the name of the "current" tag are
processed, in some order. The template writes text, that's why it's called a
"template. The recursive descent is continued with an <apply-templates>
tag inside the template. This allows you to balance output.
It
doesn't have to do a recursive descent of the source XML: that's up to the
application, though a recursive descent is the most common design
pattern. And it definitely doesn't write text: people who create a mental
model of writing text eventually get a rude awakening, usually when they first
try to tackle grouping problems.
If no matches are found, the T processor continues the descent.
There is a <template> tag (I forget what) which will select
arbitrary paths in the souce tree, and there are tags which iterate
through the result.
Again, it's best to think of the stylesheet as containing nodes
(representing instructions) rather than tags. Consider
<xsl:element name="x"><xsl:value-of
select="."/></xsl:element>
There are three tags there, but four nodes, and only two instructions.
The semantics of the language are described in terms of the two instructions,
not the three tags.
This will allow me to build
up a result "tree" which is not a mirror image of the source, something I need
to do if I'm rearranging sections of the input document. Rather than buffering
intermediate structures, the T processor does multiple passes based on these
tags, and creates the output on-the-fly. Cool.
... .
I assume there is nothing stopping me from using XSL-T to transform my
HTML to PDF, but it seems best to output XSL-FO then create a PDF
using some kind of tool. What is that tool?
It's
an XSL-FO processor. Examples are FOP, RenderX, Antenna
House.
Are there FO plug-ins available for my browsers?
No,
people are by-and-large using (X)HTML/CSS for the browser, XSL-FO/PDF for
the printed page.
Does this technology work?
Absolutely yes.
Michael Kay