OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)

[ Lists Home | Date Index | Thread Index ]

A protocol implies sending and receiving messages typically across a process
boundary or even a machine boundary. This would raise the cost of XML
parsing by a couple of orders of magnitude.

Michael Kay

> -----Original Message-----
> From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com] 
> Sent: 13 July 2005 20:01
> To: Michael Kay; 'Pete Cordell'; xml-dev@lists.xml.org
> Subject: [xml-dev] Another XML parsing idea? Was: Re: 
> [xml-dev] XML Hangover)
> 
> Today, we have a paradigm in XML parsing of using APIs
> like SAX or DOM. I was thinking of another approach to
> parse XML documents. 
> 
> Can we have a protocol (instead of API) that will talk
> between a application and the XML parser? This shall
> make using a XML parser interoperable to the calling
> application.. We could achieve this "we could have a
> Microsoft XML parser serving Java program's XML
> parsing request.."
>  
> Just now we have APIs like SAX and DOM and proprietary
> Microsoft APIs.. Had we had some protocol similar to
> HTTP, that talked between a application and parser, it
> may help interoperability..
> 
> Is this sensible thinking? Is this idea conceptually
> similar to StAX or .NET XmlReader parsing approach?
> 
> Regards,
> Mukul
> 
> --- Michael Kay <mike@saxonica.com> wrote:
> 
> > The URL got truncated
> > 
> >
> http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html
> > 
> > with ".html" at the end.
> > 
> > Michael Kay 
> > 
> > > -----Original Message-----
> > > From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
> > 
> > > Sent: 13 July 2005 10:02
> > > To: Michael Kay; 'Pete Cordell';
> > xml-dev@lists.xml.org
> > > Subject: RE: [xml-dev] XSL for non-XML input (Was:
> > Re: 
> > > [xml-dev] XML Hangover)
> > > 
> > > Hi Mike,
> > >   I get error
> > > HTTP 404 - File not found
> > > 
> > > --- Michael Kay <mike@saxonica.com> wrote:
> > > >
> > >
> >
> http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.htm
> > > 
> > > Regards,
> > > Mukul
> > > 
> > > >  
> > > > Michael Kay 
> > > > 
> > > >  
> > > > Going further, observing the idea of using out
> > of
> > > > band data (e.g. schema) to
> > > > provide extra information to complete 'binary
> > XML',
> > > > could XSL (with suitable
> > > > front ends) work on say an ASN.1 encoded X.509
> > > > certificate (and ASN.1
> > > > message definition) and produce, say, a PDF
> > output?
> > > >  
> > > > Not that I have a need to do that right now! 
> > I'm
> > > > just interested to know
> > > > whether XSL can be used as a kind of universal
> > data
> > > > translator.
> > > >  
> > > > Thanks,
> > > >  
> > > > Pete.
> > > > --
> > > > =============================================
> > > > Pete Cordell
> > > > Tech-Know-Ware Ltd
> > > >
> > >
> >
> -----------------------------------------------------------------
> > > >                          for XML to C++ data
> > binding
> > > > visit
> > > >                         
> > > > http://www.tech-know-ware.com/lmx
> > > >                          (or
> > http://www.xml2cpp.com)
> > > > =============================================
> > > > 
> > > > 
> > > > ----- Original Message ----- 
> > > > From: Michael Kay <mailto:mike@saxonica.com>  
> > > > To: 'Joe Schaffner'
> > <mailto:schaffner.joe@gmail.com>
> > > >  ;
> > > > xml-dev@lists.xml.org 
> > > > Sent: Monday, July 11, 2005 9:00 PM
> > > > Subject: RE: [xml-dev] XML Hangover
> > > > 
> > > >  
> > > > 
> > > > I've been reading the XML litterature. It's
> > great.
> > > > Just a few comments: 
> > > >  
> > > > Welcome on board. It's refreshing to get
> > thoughtful
> > > > comments from someone
> > > > who's new to the game. 
> > > >  
> > > > XSL - XML Stylesheets is divided into two parts,
> > > > XSL-T and XSL-FO.
> > > >  
> > > > The T part deals with templates and translation.
> > > > Since HTML is valid XML, I
> > > > guess I can parse my HTML using XSL-T to produce
> > XML
> > > > and vice versa. I don't
> > > > understand why XSL-T refers to "nodes in an
> > output
> > > > tree". This suggests some
> > > > kind of internal representation, but XML is
> > > > perfectly good representation
> > > > language. Don't <templates> merely write XML
> > text to
> > > > stdout?  
> > > >  
> > > > No, the result tree is completely abstract,
> > there is
> > > > no suggestion of an
> > > > internal representation. In fact, for many XSLT
> > > > processors, the "result
> > > > tree" is represented internally as a stream of
> > > > events, not as a linked
> > > > collection of objects in memory. This concept of
> > > > writing a tree, rather than
> > > > writing text, however is extremely important.
> > > > Firstly, it defines a
> > > > separation of the information content of an XML
> > > > document from the accidental
> > > > aspects of its lexical representation -
> > something
> > > > that is sadly missing from
> > > > the XML spec itself. In turn, this gives you a
> > basis
> > > > for defining a concise
> > > > set of operators that are in some sense
> > complete,
> > > > composable and exhibit
> > > > closure. In practical terms, it gives you the
> > > > ability to write a series of
> > > > transformations - a pipeline - in which the
> > > > expensive steps of serializing
> > > > and parsing intermediate results can be
> > eliminated. 
> > > >  
> > > > Roughly, the process seems to work like this:
> > the T
> > > > processor does a
> > > > recursive descent of the source XML. At each
> > node it
> > > > evaluates the set of
> > > > templates. Those templates which match the name
> > of
> > > > the "current" tag are
> > > > processed, in some order. The template writes
> > text,
> > > > that's why it's called a
> > > > "template. The recursive descent is continued
> > with
> > > > an <apply-templates> tag
> > > > inside the template. This allows you to balance
> > > > output.  
> > > >  
> > > > It doesn't have to do a recursive descent of the
> > > > source XML: that's up to
> > > > the application, though a recursive descent is
> > the
> > > > most common design
> > > > pattern. And it definitely doesn't write text:
> > > > people who create a mental
> > > > model of writing text eventually get a rude
> > > > awakening, usually when they
> > > > first try to tackle grouping problems.
> > > >  
> > > > If no matches are found, the T processor
> > continues
> > > > the descent.
> > > >  
> > > > There is a <template> tag (I forget what) which
> > will
> > > > select arbitrary paths
> > > > in the souce tree, and there are tags which
> > iterate
> > > > through the result.  
> > > >  
> > > > Again, it's best to think of the stylesheet as
> > > > containing nodes
> > > > (representing instructions) rather than tags.
> > > > Consider
> > > >  
> > > > <xsl:element name="x"><xsl:value-of
> > > > select="."/></xsl:element>
> > > >  
> > > > There are three tags there, but four nodes, and
> > only
> > > > two instructions. The
> > > > semantics of the language are described in terms
> > of
> > > > the two instructions,
> > > > not the three tags.
> > > >  
> > > >  This will allow me to build up a result "tree"
> > > > which is not a mirror image
> > > > of the source, something I need to do if I'm
> > > > rearranging sections of the
> > > > input document. Rather than buffering
> > intermediate
> > > > structures, the T
> > > > processor does multiple passes based on these
> > tags,
> > > > and creates the output
> > > > on-the-fly. Cool. 
> > > >  
> > > >  ... .
> > > >  
> > > > I assume there is nothing stopping me from using
> > > > XSL-T to transform my HTML
> > > > to PDF, but it seems best to output XSL-FO then
> > > > create a PDF using some kind
> > > > of tool. What is that tool? 
> > > >  
> > > > It's an XSL-FO processor. Examples are FOP,
> > RenderX,
> > > > Antenna House. 
> > > >  
> > > > Are there FO plug-ins available for my browsers?
> > 
> > > >  
> > > > No, people are by-and-large using (X)HTML/CSS
> > for
> > > > the browser, XSL-FO/PDF
> > > > for the printed page. 
> > > >  
> > > > Does this technology work? 
> > > >  
> > > > Absolutely yes. 
> > > >  
> > > > Michael Kay
> > > > http://www.saxonica.com/
> > > > 
> > > > 
> > > 
> > > 
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> > protection around 
> > > http://mail.yahoo.com 
> > > 
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS