[
Lists Home |
Date Index |
Thread Index
]
At 2005-07-11 21:42 +0300, Joe Schaffner wrote:
>XSL - XML Stylesheets is divided into two parts, XSL-T and XSL-FO.
>
>The T part deals with templates and translation. Since HTML is valid XML,
(it isn't)
>I guess I can parse my HTML using XSL-T to produce XML and vice versa.
No, XSLT will only take in XML, not HTML. Since XHTML is XML then you can
use XHTML as an input, but XSLT 1.0 does not accept SGML as input.
>I don't understand why XSL-T refers to "nodes in an output tree".
Because not all needs for transformation require a concrete representation
of the result in angle brackets.
>This suggests some kind of internal representation, but XML is perfectly
>good representation language.
A perfectly good concrete representation, yes, but if the downstream
process can act on the abstract representation of a node tree, why bother
with a concrete representation of angle brackets?
>Don't <templates> merely write XML text to stdout?
Not at all ... templates express the nodes that are added to the abstract
result tree. If you want to serialize the abstract result tree into
concrete angle brackets you get XML, or you can request SGML angle brackets
assuming the HTML vocabulary (if it is supported by the processor), or you
can request simple text output without any markup (again if it is supported
by the processor).
If you don't need a concrete representation, and the downstream process
(say, for example, an XSL-FO formatter) can work with the output abstract
node tree as an input abstract node tree (quelle surprise!) and act on the
transformed information with nary an angle bracket in sight.
>Roughly, the process seems to work like this: the T processor does a
>recursive descent of the source XML.
Not necessarily ... it is up to the stylesheet to decide how to descend
into the tree. If the stylesheet doesn't say anything about a given
element, however, the built-in processing is in fact a depth-first
breadth-next descent.
>At each node it evaluates the set of templates. Those templates which
>match the name of the "current" tag are processed, in some order.
False ... exactly one template matches a given input node based on the
XPath data model. The data model models elements, attributes, processing
instructions, comments, text and namespaces ... it does not model tags.
If the stylesheet should happen to present two templates that both happen
to match a given input node of any kind, that is an error
condition. Facilities and techniques are available in the language to
specialize two templates as being distinct, but however much attempt has
been made, it is still a "template conflict error" should the stylesheet
present two templates for one node.
>The template writes text, that's why it's called a "template.
False. The template expresses those nodes that are to be added to the
result tree at the given point of result tree construction. It is a
template of the result tree fragment. It doesn't "write text".
>The recursive descent is continued with an <apply-templates> tag inside
>the template. This allows you to balance output.
Not sure what you mean by the verb "balance" here ... the stylesheet writer
is obliged to express the transform such that the result tree be
constructed in result parse order in a single pass. Note the "single pass"
business: once you have constructed part of the result tree there is no
"going back" to massage your result tree ... you only have one chance to
write out each output node (with the exception of the current set of
attributes for an element whose content has not yet begun).
>If no matches are found, the T processor continues the descent.
If there are no stylesheet templates to match a given node, there are
built-in templates to match any given node, and for elements the descent is
continued.
>There is a <template> tag (I forget what) which will select arbitrary
>paths in the souce tree, and there are tags which iterate through the
>result. This will allow me to build up a result "tree" which is not a
>mirror image of the source, something I need to do if I'm rearranging
>sections of the input document. Rather than buffering intermediate
>structures, the T processor does multiple passes based on these tags, and
>creates the output on-the-fly. Cool.
Templates don't select arbitrary paths, the stylesheet requests to push and
pull nodes address nodes using arbitrary address expressions into the XPath
data model used for XML documents.
>Then there is the DOM - Document Object Model...
Which is a different document model for XML documents than the XPath data
model is for XML documents. For example, the DOM models CDATA sections
while XPath does not.
>but my XML already defines the document's object model. It's whatever I
>decide it to be. Why the confusion? Are you referring to parsing an XML
>document using some kind of programming environment to produce a tree in
>some kind of intermediate representation? Are there C libs/structs or Java
>classes or Perl modules which tear apart an XML document?
Maybe in somebody's program, but not in XSLT. I'm getting confused by your
reference to an "object model" ... XML defines a syntax for structured
documents, XPath defines a data model for the information expressed in XML
syntax, DOM defines a different data model for the information expressed in
XML syntax.
>This would allow someone to build an XML parser into any configuration
>program.
Not sure what you mean by that.
>The FO of XSL parts looks alot like Knuth's TeX, a graphical document
>description language, set in XML, incredibly complex.
If you find it complex, then perhaps you haven't looked at it closely
enough. It is a set of pagination semantics for both flow and formatting,
expressed in an XML vocabulary of elements and attributes. It is well
designed and easy to teach and easy to use (I admit the W3C spec is not
that easy to read ... it was written more for XSL-FO engine implementers
than for stylesheet writers).
>I guess there are FO display tools out there (called "formatters") that
>interpret FO.
There are many, both commercial and free, both expensive and affordable.
>[Postscript was a procedural language for graphical page description. PDF
>is a data definition language which takes the procedures out of
>Postscript. FO is a data definiton language which describes the appearance
>of an XML document as richly as these other languages.]
It is more abstract ... XSL-FO does not say anything about rendering, only
about formatting ... deciding what goes where on a page, not how what goes
where is expressed in a page description language. Many tools offer PDF
and some tools offer PostScript and other tools offer other output expressions.
>[Does anybody remember what a dvi representation did? I think it stood for
>"device indepent" but TeX was already device independent.]
>
>I had a ghostscript viewer which seemed to understand dvi files, and
>postscript too. It was quite clumsy.
>
>I assume there is nothing stopping me from using XSL-T to transform my
>HTML to PDF,
First, as mentioned above, HTML isn't an input to XSLT ... but even if you
input XHTML putting out PDF by XSLT would seem to me to be quite awkward.
>but it seems best to output XSL-FO then create a PDF using some kind of tool.
Absolutely.
> What is that tool?
See http://www.xmlsoftware.com/xslfo.html for a starting point.
>Are there FO plug-ins available for my browsers?
Doubtful ... since browsers output in arbitrary and changeable window sizes
and XSL-FO pagination is geared to fixed-sized-folios (pages) of
information, I haven't yet seen a browser-based implementation of
XSL-FO. Note that there is, in fact, a way in the XSL-FO specification to
turn off pagination and be supported in a dynamically-sized window ... but
I haven't seen that implemented.
>Does this technology work?
Absolutely! I have customers producing millions of pages of customized
printed output based on XML inputs exported from databases of
information. It is very professional and it gives my customers a technical
edge ahead of their competition. I author all my training and book
materials in XML and use XSLT and XSL-FO to go to camera-ready output. We
claim that my "Definitive XSLT and XPath" book published by Prentice-Hall
is the first bookstore-shelf-published-book that was produced end-to-end in
XML because we wrote XSL-FO stylesheets to lay out the pages and supplied
PDF files to PH for publication with the final camera-ready output. We
wrote an annex in the book describing our process. My "Definitive XSL-FO"
book then just re-used the stylesheets and that production process was much
quicker. I use different stylesheets to publish the same content as PDF
books for sale on my web site. I use those also to publish my student
handouts for my hands-on courses. All this from the same set of XML inputs.
I hope this helps.
. . . . . . . Ken
--
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995)
Male Breast Cancer Awareness http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
|