I've been reading the XML litterature. It's great. Just a few
comments:
Welcome on board. It's refreshing to get thoughtful
comments from someone
who's new to the game.
XSL - XML Stylesheets is divided into two parts, XSL-T and
XSL-FO.
The T part deals with templates and translation. Since HTML is valid
XML, I guess I can parse my HTML using XSL-T to produce XML and vice
versa. I don't understand why XSL-T refers to "nodes in an output
tree". This suggests some kind of internal representation, but XML is
perfectly good representation language. Don't <templates>
merely write XML text to stdout?
No, the result tree is completely abstract, there is no
suggestion of an internal representation. In fact, for many XSLT
processors, the "result tree" is represented internally as a stream
of events, not as a linked collection of objects in memory. This concept
of writing a tree, rather than writing text, however is extremely
important. Firstly, it defines a separation of the information
content of an XML document from the accidental aspects of its lexical
representation - something that is sadly missing from the XML spec
itself. In turn, this gives you a basis for defining a concise set of
operators that are in some sense complete, composable and exhibit closure.
In practical terms, it gives you the ability to write a series of
transformations - a pipeline - in which the expensive steps of serializing
and parsing intermediate results can be
eliminated.
Roughly, the process seems to work like this: the T
processor does a recursive descent of the source XML. At each node it
evaluates the set of templates. Those templates which match the name of
the "current" tag are processed, in some order. The template writes text,
that's why it's called a "template. The recursive descent is continued
with an <apply-templates> tag inside the template. This allows you
to balance output.
It doesn't have to do a recursive descent of the source XML: that's
up to the application, though a recursive descent is the most
common design pattern. And it definitely doesn't write text: people who
create a mental model of writing text eventually get a rude awakening,
usually when they first try to tackle grouping
problems.
If no matches are found, the T processor continues the descent.
There is a <template> tag (I forget what) which will
select arbitrary paths in the souce tree, and there are tags
which iterate through the result.
Again, it's best to think of the stylesheet as containing nodes
(representing instructions) rather than tags. Consider
<xsl:element name="x"><xsl:value-of
select="."/></xsl:element>
There are three tags there, but four nodes, and only two
instructions. The semantics of the language are described in terms of the
two instructions, not the three tags.
This will allow me to
build up a result "tree" which is not a mirror image of the source,
something I need to do if I'm rearranging sections of the input document.
Rather than buffering intermediate structures, the T processor does
multiple passes based on these tags, and creates the output on-the-fly.
Cool.
... .
I assume there is nothing stopping me from using XSL-T to transform
my HTML to PDF, but it seems best to output XSL-FO then create a
PDF using some kind of tool. What is that tool?
It's an XSL-FO processor. Examples are FOP, RenderX, Antenna
House.
Are there FO plug-ins available for my browsers?
No, people are by-and-large using (X)HTML/CSS for the browser,
XSL-FO/PDF for the printed page.
Does this technology work?
Absolutely yes.
Michael Kay