xml-dev - Re: [xml-dev] XML Hangover

Re: [xml-dev] XML Hangover

[ Lists Home | Date Index | Thread Index ]

To: Joe Schaffner <schaffner.joe@gmail.com>, xml-dev@lists.xml.org
Subject: Re: [xml-dev] XML Hangover
From: tpassin@comcast.net
Date: Mon, 11 Jul 2005 20:00:44 +0000

> I've been reading the XML litterature. It's great.

Umm, maybe not enough xml literature yet ... :-).

> Since HTML is valid XML

No, it's not, although it is possible to write most html so that it is *well-formed* xml.  However, most html is not well-formed xml either.  *XHTLM* is supposed to be well-formed xml, but in practice many (maybe most?) sites that put out xhtml manage to have something wrong with it.

> I don't  understand why XSL-T refers to "nodes in an output tree". ... Don't <templates> merely write XML text to stdout?

No, the xslt processor creates a tree of nodes.  After that, it *may* create output by serializing that tree to stdout (most command line xslt processors do that), or it may return a tree in memory, or somehting else.  It most emphatically DOES NOT just write text fragments to stdout or anything else.  If you think like that, you will have a heck of a time getting xslt ot do what you want.

> Rather than buffering intermediate structures, the T processor  does multiple passes based on these tags, and creates the output on-the-fly.

No, not at all, see above.  The processor builds the entire output tree, then perhaps may serialize to a text format.  That is the stage at which the desired character encoding gets applied, BTW.

> Then there is the DOM - Document Object Model... but my XML already defines 
the document's object model. It's whatever I decide it to be. 

Here's another place you need to do more reading ...  The DOM is a standardized way to describe a general xml document in terms of nodes and such-like, so that it can be represented in memory and worked on by a processor.

Cheers,

Tom P

--- Begin Message ---

To: xml-dev@lists.xml.org

Subject: [xml-dev] XML Hangover

From: Joe Schaffner <schaffner.joe@gmail.com>

Date: Mon, 11 Jul 2005 18:45:48 +0000

Hello Everyone.

I've been reading the XML litterature. It's great. Just a few comments:

XSL - XML Stylesheets is divided into two parts, XSL-T and XSL-FO.

The T part deals with templates and translation. Since HTML is valid XML, I guess I can parse my HTML using XSL-T to produce XML and vice versa. I don't understand why XSL-T refers to "nodes in an output tree". This suggests some kind of internal representation, but XML is perfectly good representation language. Don't <templates> merely write XML text to stdout?

Roughly, the process seems to work like this: the T processor does a recursive descent of the source XML. At each node it evaluates the set of templates. Those templates which match the name of the "current" tag are processed, in some order. The template writes text, that's why it's called a "template. The recursive descent is continued with an <apply-templates> tag inside the template. This allows you to balance output.

If no matches are found, the T processor continues the descent.

There is a <template> tag (I forget what) which will select arbitrary paths in the souce tree, and there are tags which iterate through the result. This will allow me to build up a result "tree" which is not a mirror image of the source, something I need to do if I'm rearranging sections of the input document. Rather than buffering intermediate structures, the T processor does multiple passes based on these tags, and creates the output on-the-fly. Cool.

Then there is the DOM - Document Object Model... but my XML already defines the document's object model. It's whatever I decide it to be. Why the confusion? Are you referring to parsing an XML document using some kind of programming environment to produce a tree in some kind of intermediate representation? Are there C libs/structs or Java classes or Perl modules which tear apart an XML document?

This would allow someone to build an XML parser into any configuration program.

The FO of XSL parts looks alot like Knuth's TeX, a graphical document description language, set in XML, incredibly complex. I guess there are FO display tools out there (called "formatters") that interpret FO. [Postscript was a procedural language for graphical page description. PDF is a data definition language which takes the procedures out of Postscript. FO is a data definiton language which describes the appearance of an XML document as richly as these other languages.]

[Does anybody remember what a dvi representation did? I think it stood for "device indepent" but TeX was already device independent.]

I had a ghostscript viewer which seemed to understand dvi files, and postscript too. It was quite clumsy.

I assume there is nothing stopping me from using XSL-T to transform my HTML to PDF, but it seems best to output XSL-FO then create a PDF using some kind of tool. What is that tool?

Are there FO plug-ins available for my browsers?

Does this technology work?

Joe

http://modern-greek-verbs.tripod.com/

--- End Message ---

Follow-Ups:
- RE: [xml-dev] XML Hangover
  - From: "Michael Kay" <mike@saxonica.com>

Prev by Date: RE: [xml-dev] XML Hangover
Next by Date: Re: [xml-dev] XML Hangover
Previous by thread: XML Hangover
Next by thread: RE: [xml-dev] XML Hangover
Index(es):
- Date
- Thread