xml-dev - Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)

Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)

[ Lists Home | Date Index | Thread Index ]

To: Michael Kay <mike@saxonica.com>
Subject: Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
From: Bob Foster <bob@objfac.com>
Date: Wed, 13 Jul 2005 14:18:03 -0700
Cc: xml-dev@lists.xml.org
In-reply-to: <20050713202928.F31912FFA4@loot.dreamhost.com>
References: <20050713202928.F31912FFA4@loot.dreamhost.com>
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

I don't know the internals (maybe someone can comment) but I believe 
Markup Technology has a protocol for passing PSVI around. It seems 
pretty darn fast.

Bob Foster

Michael Kay wrote:
 > A protocol implies sending and receiving messages typically across a 
process
 > boundary or even a machine boundary. This would raise the cost of XML
 > parsing by a couple of orders of magnitude.
 >
 > Michael Kay
 >
 >
 >>-----Original Message-----
 >>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
 >>Sent: 13 July 2005 20:01
 >>To: Michael Kay; 'Pete Cordell'; xml-dev@lists.xml.org
 >>Subject: [xml-dev] Another XML parsing idea? Was: Re:
 >>[xml-dev] XML Hangover)
 >>
 >>Today, we have a paradigm in XML parsing of using APIs
 >>like SAX or DOM. I was thinking of another approach to
 >>parse XML documents.
 >>
 >>Can we have a protocol (instead of API) that will talk
 >>between a application and the XML parser? This shall
 >>make using a XML parser interoperable to the calling
 >>application.. We could achieve this "we could have a
 >>Microsoft XML parser serving Java program's XML
 >>parsing request.."
 >>
 >>Just now we have APIs like SAX and DOM and proprietary
 >>Microsoft APIs.. Had we had some protocol similar to
 >>HTTP, that talked between a application and parser, it
 >>may help interoperability..
 >>
 >>Is this sensible thinking? Is this idea conceptually
 >>similar to StAX or .NET XmlReader parsing approach?
 >>
 >>Regards,
 >>Mukul
 >>
 >>--- Michael Kay <mike@saxonica.com> wrote:
 >>
 >>
 >>>The URL got truncated
 >>>
 >>>
 >>
 >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html
 >>
 >>>with ".html" at the end.
 >>>
 >>>Michael Kay
 >>>
 >>>
 >>>>-----Original Message-----
 >>>>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
 >>>
 >>>>Sent: 13 July 2005 10:02
 >>>>To: Michael Kay; 'Pete Cordell';
 >>>
 >>>xml-dev@lists.xml.org
 >>>
 >>>>Subject: RE: [xml-dev] XSL for non-XML input (Was:
 >>>
 >>>Re:
 >>>
 >>>>[xml-dev] XML Hangover)
 >>>>
 >>>>Hi Mike,
 >>>>  I get error
 >>>>HTTP 404 - File not found
 >>>>
 >>>>--- Michael Kay <mike@saxonica.com> wrote:
 >>>>
 >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.htm
 >>
 >>>>Regards,
 >>>>Mukul
 >>>>
 >>>>
 >>>>>
 >>>>>Michael Kay
 >>>>>
 >>>>>
 >>>>>Going further, observing the idea of using out
 >>>
 >>>of
 >>>
 >>>>>band data (e.g. schema) to
 >>>>>provide extra information to complete 'binary
 >>>
 >>>XML',
 >>>
 >>>>>could XSL (with suitable
 >>>>>front ends) work on say an ASN.1 encoded X.509
 >>>>>certificate (and ASN.1
 >>>>>message definition) and produce, say, a PDF
 >>>
 >>>output?
 >>>
 >>>>>
 >>>>>Not that I have a need to do that right now!
 >>>
 >>>I'm
 >>>
 >>>>>just interested to know
 >>>>>whether XSL can be used as a kind of universal
 >>>
 >>>data
 >>>
 >>>>>translator.
 >>>>>
 >>>>>Thanks,
 >>>>>
 >>>>>Pete.
 >>>>>--
 >>>>>=============================================
 >>>>>Pete Cordell
 >>>>>Tech-Know-Ware Ltd
 >>>>>
 >>>>
 >>-----------------------------------------------------------------
 >>
 >>>>>                         for XML to C++ data
 >>>
 >>>binding
 >>>
 >>>>>visit
 >>>>>
 >>>>>http://www.tech-know-ware.com/lmx
 >>>>>                         (or
 >>>
 >>>http://www.xml2cpp.com)
 >>>
 >>>>>=============================================
 >>>>>
 >>>>>
 >>>>>----- Original Message -----
 >>>>>From: Michael Kay <mailto:mike@saxonica.com>
 >>>>>To: 'Joe Schaffner'
 >>>
 >>><mailto:schaffner.joe@gmail.com>
 >>>
 >>>>> ;
 >>>>>xml-dev@lists.xml.org
 >>>>>Sent: Monday, July 11, 2005 9:00 PM
 >>>>>Subject: RE: [xml-dev] XML Hangover
 >>>>>
 >>>>>
 >>>>>
 >>>>>I've been reading the XML litterature. It's
 >>>
 >>>great.
 >>>
 >>>>>Just a few comments:
 >>>>>
 >>>>>Welcome on board. It's refreshing to get
 >>>
 >>>thoughtful
 >>>
 >>>>>comments from someone
 >>>>>who's new to the game.
 >>>>>
 >>>>>XSL - XML Stylesheets is divided into two parts,
 >>>>>XSL-T and XSL-FO.
 >>>>>
 >>>>>The T part deals with templates and translation.
 >>>>>Since HTML is valid XML, I
 >>>>>guess I can parse my HTML using XSL-T to produce
 >>>
 >>>XML
 >>>
 >>>>>and vice versa. I don't
 >>>>>understand why XSL-T refers to "nodes in an
 >>>
 >>>output
 >>>
 >>>>>tree". This suggests some
 >>>>>kind of internal representation, but XML is
 >>>>>perfectly good representation
 >>>>>language. Don't <templates> merely write XML
 >>>
 >>>text to
 >>>
 >>>>>stdout?
 >>>>>
 >>>>>No, the result tree is completely abstract,
 >>>
 >>>there is
 >>>
 >>>>>no suggestion of an
 >>>>>internal representation. In fact, for many XSLT
 >>>>>processors, the "result
 >>>>>tree" is represented internally as a stream of
 >>>>>events, not as a linked
 >>>>>collection of objects in memory. This concept of
 >>>>>writing a tree, rather than
 >>>>>writing text, however is extremely important.
 >>>>>Firstly, it defines a
 >>>>>separation of the information content of an XML
 >>>>>document from the accidental
 >>>>>aspects of its lexical representation -
 >>>
 >>>something
 >>>
 >>>>>that is sadly missing from
 >>>>>the XML spec itself. In turn, this gives you a
 >>>
 >>>basis
 >>>
 >>>>>for defining a concise
 >>>>>set of operators that are in some sense
 >>>
 >>>complete,
 >>>
 >>>>>composable and exhibit
 >>>>>closure. In practical terms, it gives you the
 >>>>>ability to write a series of
 >>>>>transformations - a pipeline - in which the
 >>>>>expensive steps of serializing
 >>>>>and parsing intermediate results can be
 >>>
 >>>eliminated.
 >>>
 >>>>>
 >>>>>Roughly, the process seems to work like this:
 >>>
 >>>the T
 >>>
 >>>>>processor does a
 >>>>>recursive descent of the source XML. At each
 >>>
 >>>node it
 >>>
 >>>>>evaluates the set of
 >>>>>templates. Those templates which match the name
 >>>
 >>>of
 >>>
 >>>>>the "current" tag are
 >>>>>processed, in some order. The template writes
 >>>
 >>>text,
 >>>
 >>>>>that's why it's called a
 >>>>>"template. The recursive descent is continued
 >>>
 >>>with
 >>>
 >>>>>an <apply-templates> tag
 >>>>>inside the template. This allows you to balance
 >>>>>output.
 >>>>>
 >>>>>It doesn't have to do a recursive descent of the
 >>>>>source XML: that's up to
 >>>>>the application, though a recursive descent is
 >>>
 >>>the
 >>>
 >>>>>most common design
 >>>>>pattern. And it definitely doesn't write text:
 >>>>>people who create a mental
 >>>>>model of writing text eventually get a rude
 >>>>>awakening, usually when they
 >>>>>first try to tackle grouping problems.
 >>>>>
 >>>>>If no matches are found, the T processor
 >>>
 >>>continues
 >>>
 >>>>>the descent.
 >>>>>
 >>>>>There is a <template> tag (I forget what) which
 >>>
 >>>will
 >>>
 >>>>>select arbitrary paths
 >>>>>in the souce tree, and there are tags which
 >>>
 >>>iterate
 >>>
 >>>>>through the result.
 >>>>>
 >>>>>Again, it's best to think of the stylesheet as
 >>>>>containing nodes
 >>>>>(representing instructions) rather than tags.
 >>>>>Consider
 >>>>>
 >>>>><xsl:element name="x"><xsl:value-of
 >>>>>select="."/></xsl:element>
 >>>>>
 >>>>>There are three tags there, but four nodes, and
 >>>
 >>>only
 >>>
 >>>>>two instructions. The
 >>>>>semantics of the language are described in terms
 >>>
 >>>of
 >>>
 >>>>>the two instructions,
 >>>>>not the three tags.
 >>>>>
 >>>>> This will allow me to build up a result "tree"
 >>>>>which is not a mirror image
 >>>>>of the source, something I need to do if I'm
 >>>>>rearranging sections of the
 >>>>>input document. Rather than buffering
 >>>
 >>>intermediate
 >>>
 >>>>>structures, the T
 >>>>>processor does multiple passes based on these
 >>>
 >>>tags,
 >>>
 >>>>>and creates the output
 >>>>>on-the-fly. Cool.
 >>>>>
 >>>>> ... .
 >>>>>
 >>>>>I assume there is nothing stopping me from using
 >>>>>XSL-T to transform my HTML
 >>>>>to PDF, but it seems best to output XSL-FO then
 >>>>>create a PDF using some kind
 >>>>>of tool. What is that tool?
 >>>>>
 >>>>>It's an XSL-FO processor. Examples are FOP,
 >>>
 >>>RenderX,
 >>>
 >>>>>Antenna House.
 >>>>>
 >>>>>Are there FO plug-ins available for my browsers?
 >>>
 >>>>>
 >>>>>No, people are by-and-large using (X)HTML/CSS
 >>>
 >>>for
 >>>
 >>>>>the browser, XSL-FO/PDF
 >>>>>for the printed page.
 >>>>>
 >>>>>Does this technology work?
 >>>>>
 >>>>>Absolutely yes.
 >>>>>
 >>>>>Michael Kay
 >>>>>http://www.saxonica.com/

Follow-Ups:
- RE: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
  - From: "Michael Kay" <mike@saxonica.com>

Prev by Date: Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
Next by Date: RE: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
Previous by thread: Re: [xml-dev] XSL for non-XML input (Was: Re: [xml-dev] XMLHangover)
Next by thread: RE: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
Index(es):
- Date
- Thread