OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)

[ Lists Home | Date Index | Thread Index ]

I think HT said that there's always an overhead if you have to cross a
thread boundary, and that they try to avoid it whenever possible. Crossing a
process or machine boundary would be far worse.

Michael Kay 

> -----Original Message-----
> From: Bob Foster [mailto:bob@objfac.com] 
> Sent: 13 July 2005 22:18
> To: Michael Kay
> Cc: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Another XML parsing idea? Was: Re: 
> [xml-dev] XML Hangover)
> 
> I don't know the internals (maybe someone can comment) but I believe 
> Markup Technology has a protocol for passing PSVI around. It seems 
> pretty darn fast.
> 
> Bob Foster
> 
> Michael Kay wrote:
>  > A protocol implies sending and receiving messages 
> typically across a 
> process
>  > boundary or even a machine boundary. This would raise the 
> cost of XML
>  > parsing by a couple of orders of magnitude.
>  >
>  > Michael Kay
>  >
>  >
>  >>-----Original Message-----
>  >>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>  >>Sent: 13 July 2005 20:01
>  >>To: Michael Kay; 'Pete Cordell'; xml-dev@lists.xml.org
>  >>Subject: [xml-dev] Another XML parsing idea? Was: Re:
>  >>[xml-dev] XML Hangover)
>  >>
>  >>Today, we have a paradigm in XML parsing of using APIs
>  >>like SAX or DOM. I was thinking of another approach to
>  >>parse XML documents.
>  >>
>  >>Can we have a protocol (instead of API) that will talk
>  >>between a application and the XML parser? This shall
>  >>make using a XML parser interoperable to the calling
>  >>application.. We could achieve this "we could have a
>  >>Microsoft XML parser serving Java program's XML
>  >>parsing request.."
>  >>
>  >>Just now we have APIs like SAX and DOM and proprietary
>  >>Microsoft APIs.. Had we had some protocol similar to
>  >>HTTP, that talked between a application and parser, it
>  >>may help interoperability..
>  >>
>  >>Is this sensible thinking? Is this idea conceptually
>  >>similar to StAX or .NET XmlReader parsing approach?
>  >>
>  >>Regards,
>  >>Mukul
>  >>
>  >>--- Michael Kay <mike@saxonica.com> wrote:
>  >>
>  >>
>  >>>The URL got truncated
>  >>>
>  >>>
>  >>
>  
> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
> paper.html
>  >>
>  >>>with ".html" at the end.
>  >>>
>  >>>Michael Kay
>  >>>
>  >>>
>  >>>>-----Original Message-----
>  >>>>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>  >>>
>  >>>>Sent: 13 July 2005 10:02
>  >>>>To: Michael Kay; 'Pete Cordell';
>  >>>
>  >>>xml-dev@lists.xml.org
>  >>>
>  >>>>Subject: RE: [xml-dev] XSL for non-XML input (Was:
>  >>>
>  >>>Re:
>  >>>
>  >>>>[xml-dev] XML Hangover)
>  >>>>
>  >>>>Hi Mike,
>  >>>>  I get error
>  >>>>HTTP 404 - File not found
>  >>>>
>  >>>>--- Michael Kay <mike@saxonica.com> wrote:
>  >>>>
>  
> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
> paper.htm
>  >>
>  >>>>Regards,
>  >>>>Mukul
>  >>>>
>  >>>>
>  >>>>>
>  >>>>>Michael Kay
>  >>>>>
>  >>>>>
>  >>>>>Going further, observing the idea of using out
>  >>>
>  >>>of
>  >>>
>  >>>>>band data (e.g. schema) to
>  >>>>>provide extra information to complete 'binary
>  >>>
>  >>>XML',
>  >>>
>  >>>>>could XSL (with suitable
>  >>>>>front ends) work on say an ASN.1 encoded X.509
>  >>>>>certificate (and ASN.1
>  >>>>>message definition) and produce, say, a PDF
>  >>>
>  >>>output?
>  >>>
>  >>>>>
>  >>>>>Not that I have a need to do that right now!
>  >>>
>  >>>I'm
>  >>>
>  >>>>>just interested to know
>  >>>>>whether XSL can be used as a kind of universal
>  >>>
>  >>>data
>  >>>
>  >>>>>translator.
>  >>>>>
>  >>>>>Thanks,
>  >>>>>
>  >>>>>Pete.
>  >>>>>--
>  >>>>>=============================================
>  >>>>>Pete Cordell
>  >>>>>Tech-Know-Ware Ltd
>  >>>>>
>  >>>>
>  >>-----------------------------------------------------------------
>  >>
>  >>>>>                         for XML to C++ data
>  >>>
>  >>>binding
>  >>>
>  >>>>>visit
>  >>>>>
>  >>>>>http://www.tech-know-ware.com/lmx
>  >>>>>                         (or
>  >>>
>  >>>http://www.xml2cpp.com)
>  >>>
>  >>>>>=============================================
>  >>>>>
>  >>>>>
>  >>>>>----- Original Message -----
>  >>>>>From: Michael Kay <mailto:mike@saxonica.com>
>  >>>>>To: 'Joe Schaffner'
>  >>>
>  >>><mailto:schaffner.joe@gmail.com>
>  >>>
>  >>>>> ;
>  >>>>>xml-dev@lists.xml.org
>  >>>>>Sent: Monday, July 11, 2005 9:00 PM
>  >>>>>Subject: RE: [xml-dev] XML Hangover
>  >>>>>
>  >>>>>
>  >>>>>
>  >>>>>I've been reading the XML litterature. It's
>  >>>
>  >>>great.
>  >>>
>  >>>>>Just a few comments:
>  >>>>>
>  >>>>>Welcome on board. It's refreshing to get
>  >>>
>  >>>thoughtful
>  >>>
>  >>>>>comments from someone
>  >>>>>who's new to the game.
>  >>>>>
>  >>>>>XSL - XML Stylesheets is divided into two parts,
>  >>>>>XSL-T and XSL-FO.
>  >>>>>
>  >>>>>The T part deals with templates and translation.
>  >>>>>Since HTML is valid XML, I
>  >>>>>guess I can parse my HTML using XSL-T to produce
>  >>>
>  >>>XML
>  >>>
>  >>>>>and vice versa. I don't
>  >>>>>understand why XSL-T refers to "nodes in an
>  >>>
>  >>>output
>  >>>
>  >>>>>tree". This suggests some
>  >>>>>kind of internal representation, but XML is
>  >>>>>perfectly good representation
>  >>>>>language. Don't <templates> merely write XML
>  >>>
>  >>>text to
>  >>>
>  >>>>>stdout?
>  >>>>>
>  >>>>>No, the result tree is completely abstract,
>  >>>
>  >>>there is
>  >>>
>  >>>>>no suggestion of an
>  >>>>>internal representation. In fact, for many XSLT
>  >>>>>processors, the "result
>  >>>>>tree" is represented internally as a stream of
>  >>>>>events, not as a linked
>  >>>>>collection of objects in memory. This concept of
>  >>>>>writing a tree, rather than
>  >>>>>writing text, however is extremely important.
>  >>>>>Firstly, it defines a
>  >>>>>separation of the information content of an XML
>  >>>>>document from the accidental
>  >>>>>aspects of its lexical representation -
>  >>>
>  >>>something
>  >>>
>  >>>>>that is sadly missing from
>  >>>>>the XML spec itself. In turn, this gives you a
>  >>>
>  >>>basis
>  >>>
>  >>>>>for defining a concise
>  >>>>>set of operators that are in some sense
>  >>>
>  >>>complete,
>  >>>
>  >>>>>composable and exhibit
>  >>>>>closure. In practical terms, it gives you the
>  >>>>>ability to write a series of
>  >>>>>transformations - a pipeline - in which the
>  >>>>>expensive steps of serializing
>  >>>>>and parsing intermediate results can be
>  >>>
>  >>>eliminated.
>  >>>
>  >>>>>
>  >>>>>Roughly, the process seems to work like this:
>  >>>
>  >>>the T
>  >>>
>  >>>>>processor does a
>  >>>>>recursive descent of the source XML. At each
>  >>>
>  >>>node it
>  >>>
>  >>>>>evaluates the set of
>  >>>>>templates. Those templates which match the name
>  >>>
>  >>>of
>  >>>
>  >>>>>the "current" tag are
>  >>>>>processed, in some order. The template writes
>  >>>
>  >>>text,
>  >>>
>  >>>>>that's why it's called a
>  >>>>>"template. The recursive descent is continued
>  >>>
>  >>>with
>  >>>
>  >>>>>an <apply-templates> tag
>  >>>>>inside the template. This allows you to balance
>  >>>>>output.
>  >>>>>
>  >>>>>It doesn't have to do a recursive descent of the
>  >>>>>source XML: that's up to
>  >>>>>the application, though a recursive descent is
>  >>>
>  >>>the
>  >>>
>  >>>>>most common design
>  >>>>>pattern. And it definitely doesn't write text:
>  >>>>>people who create a mental
>  >>>>>model of writing text eventually get a rude
>  >>>>>awakening, usually when they
>  >>>>>first try to tackle grouping problems.
>  >>>>>
>  >>>>>If no matches are found, the T processor
>  >>>
>  >>>continues
>  >>>
>  >>>>>the descent.
>  >>>>>
>  >>>>>There is a <template> tag (I forget what) which
>  >>>
>  >>>will
>  >>>
>  >>>>>select arbitrary paths
>  >>>>>in the souce tree, and there are tags which
>  >>>
>  >>>iterate
>  >>>
>  >>>>>through the result.
>  >>>>>
>  >>>>>Again, it's best to think of the stylesheet as
>  >>>>>containing nodes
>  >>>>>(representing instructions) rather than tags.
>  >>>>>Consider
>  >>>>>
>  >>>>><xsl:element name="x"><xsl:value-of
>  >>>>>select="."/></xsl:element>
>  >>>>>
>  >>>>>There are three tags there, but four nodes, and
>  >>>
>  >>>only
>  >>>
>  >>>>>two instructions. The
>  >>>>>semantics of the language are described in terms
>  >>>
>  >>>of
>  >>>
>  >>>>>the two instructions,
>  >>>>>not the three tags.
>  >>>>>
>  >>>>> This will allow me to build up a result "tree"
>  >>>>>which is not a mirror image
>  >>>>>of the source, something I need to do if I'm
>  >>>>>rearranging sections of the
>  >>>>>input document. Rather than buffering
>  >>>
>  >>>intermediate
>  >>>
>  >>>>>structures, the T
>  >>>>>processor does multiple passes based on these
>  >>>
>  >>>tags,
>  >>>
>  >>>>>and creates the output
>  >>>>>on-the-fly. Cool.
>  >>>>>
>  >>>>> ... .
>  >>>>>
>  >>>>>I assume there is nothing stopping me from using
>  >>>>>XSL-T to transform my HTML
>  >>>>>to PDF, but it seems best to output XSL-FO then
>  >>>>>create a PDF using some kind
>  >>>>>of tool. What is that tool?
>  >>>>>
>  >>>>>It's an XSL-FO processor. Examples are FOP,
>  >>>
>  >>>RenderX,
>  >>>
>  >>>>>Antenna House.
>  >>>>>
>  >>>>>Are there FO plug-ins available for my browsers?
>  >>>
>  >>>>>
>  >>>>>No, people are by-and-large using (X)HTML/CSS
>  >>>
>  >>>for
>  >>>
>  >>>>>the browser, XSL-FO/PDF
>  >>>>>for the printed page.
>  >>>>>
>  >>>>>Does this technology work?
>  >>>>>
>  >>>>>Absolutely yes.
>  >>>>>
>  >>>>>Michael Kay
>  >>>>>http://www.saxonica.com/
> 
> 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS