OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)

[ Lists Home | Date Index | Thread Index ]
  • To: Michael Kay <mike@saxonica.com>
  • Subject: Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
  • From: Bob Foster <bob@objfac.com>
  • Date: Wed, 13 Jul 2005 15:47:09 -0700
  • Cc: xml-dev@lists.xml.org
  • In-reply-to: <20050713220649.96EBE2FD40@loot.dreamhost.com>
  • References: <20050713220649.96EBE2FD40@loot.dreamhost.com>
  • User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

True. I was just reacting to the idea that a protocol necessarily 
implied high overhead. I wonder what HT does to avoid crossing thread 
boundaries? Re-invent co-routines?

Bob Foster

Michael Kay wrote:
> I think HT said that there's always an overhead if you have to cross a
> thread boundary, and that they try to avoid it whenever possible. Crossing a
> process or machine boundary would be far worse.
> 
> Michael Kay 
> 
> 
>>-----Original Message-----
>>From: Bob Foster [mailto:bob@objfac.com] 
>>Sent: 13 July 2005 22:18
>>To: Michael Kay
>>Cc: xml-dev@lists.xml.org
>>Subject: Re: [xml-dev] Another XML parsing idea? Was: Re: 
>>[xml-dev] XML Hangover)
>>
>>I don't know the internals (maybe someone can comment) but I believe 
>>Markup Technology has a protocol for passing PSVI around. It seems 
>>pretty darn fast.
>>
>>Bob Foster
>>
>>Michael Kay wrote:
>> > A protocol implies sending and receiving messages 
>>typically across a 
>>process
>> > boundary or even a machine boundary. This would raise the 
>>cost of XML
>> > parsing by a couple of orders of magnitude.
>> >
>> > Michael Kay
>> >
>> >
>> >>-----Original Message-----
>> >>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>> >>Sent: 13 July 2005 20:01
>> >>To: Michael Kay; 'Pete Cordell'; xml-dev@lists.xml.org
>> >>Subject: [xml-dev] Another XML parsing idea? Was: Re:
>> >>[xml-dev] XML Hangover)
>> >>
>> >>Today, we have a paradigm in XML parsing of using APIs
>> >>like SAX or DOM. I was thinking of another approach to
>> >>parse XML documents.
>> >>
>> >>Can we have a protocol (instead of API) that will talk
>> >>between a application and the XML parser? This shall
>> >>make using a XML parser interoperable to the calling
>> >>application.. We could achieve this "we could have a
>> >>Microsoft XML parser serving Java program's XML
>> >>parsing request.."
>> >>
>> >>Just now we have APIs like SAX and DOM and proprietary
>> >>Microsoft APIs.. Had we had some protocol similar to
>> >>HTTP, that talked between a application and parser, it
>> >>may help interoperability..
>> >>
>> >>Is this sensible thinking? Is this idea conceptually
>> >>similar to StAX or .NET XmlReader parsing approach?
>> >>
>> >>Regards,
>> >>Mukul
>> >>
>> >>--- Michael Kay <mike@saxonica.com> wrote:
>> >>
>> >>
>> >>>The URL got truncated
>> >>>
>> >>>
>> >>
>> 
>>
>>>>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
>>
>>paper.html
>> >>
>> >>>with ".html" at the end.
>> >>>
>> >>>Michael Kay
>> >>>
>> >>>
>> >>>>-----Original Message-----
>> >>>>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>> >>>
>> >>>>Sent: 13 July 2005 10:02
>> >>>>To: Michael Kay; 'Pete Cordell';
>> >>>
>> >>>xml-dev@lists.xml.org
>> >>>
>> >>>>Subject: RE: [xml-dev] XSL for non-XML input (Was:
>> >>>
>> >>>Re:
>> >>>
>> >>>>[xml-dev] XML Hangover)
>> >>>>
>> >>>>Hi Mike,
>> >>>>  I get error
>> >>>>HTTP 404 - File not found
>> >>>>
>> >>>>--- Michael Kay <mike@saxonica.com> wrote:
>> >>>>
>> 
>>
>>>>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
>>
>>paper.htm
>> >>
>> >>>>Regards,
>> >>>>Mukul
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>Michael Kay
>> >>>>>
>> >>>>>
>> >>>>>Going further, observing the idea of using out
>> >>>
>> >>>of
>> >>>
>> >>>>>band data (e.g. schema) to
>> >>>>>provide extra information to complete 'binary
>> >>>
>> >>>XML',
>> >>>
>> >>>>>could XSL (with suitable
>> >>>>>front ends) work on say an ASN.1 encoded X.509
>> >>>>>certificate (and ASN.1
>> >>>>>message definition) and produce, say, a PDF
>> >>>
>> >>>output?
>> >>>
>> >>>>>
>> >>>>>Not that I have a need to do that right now!
>> >>>
>> >>>I'm
>> >>>
>> >>>>>just interested to know
>> >>>>>whether XSL can be used as a kind of universal
>> >>>
>> >>>data
>> >>>
>> >>>>>translator.
>> >>>>>
>> >>>>>Thanks,
>> >>>>>
>> >>>>>Pete.
>> >>>>>--
>> >>>>>=============================================
>> >>>>>Pete Cordell
>> >>>>>Tech-Know-Ware Ltd
>> >>>>>
>> >>>>
>> >>-----------------------------------------------------------------
>> >>
>> >>>>>                         for XML to C++ data
>> >>>
>> >>>binding
>> >>>
>> >>>>>visit
>> >>>>>
>> >>>>>http://www.tech-know-ware.com/lmx
>> >>>>>                         (or
>> >>>
>> >>>http://www.xml2cpp.com)
>> >>>
>> >>>>>=============================================
>> >>>>>
>> >>>>>
>> >>>>>----- Original Message -----
>> >>>>>From: Michael Kay <mailto:mike@saxonica.com>
>> >>>>>To: 'Joe Schaffner'
>> >>>
>> >>><mailto:schaffner.joe@gmail.com>
>> >>>
>> >>>>> ;
>> >>>>>xml-dev@lists.xml.org
>> >>>>>Sent: Monday, July 11, 2005 9:00 PM
>> >>>>>Subject: RE: [xml-dev] XML Hangover
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>I've been reading the XML litterature. It's
>> >>>
>> >>>great.
>> >>>
>> >>>>>Just a few comments:
>> >>>>>
>> >>>>>Welcome on board. It's refreshing to get
>> >>>
>> >>>thoughtful
>> >>>
>> >>>>>comments from someone
>> >>>>>who's new to the game.
>> >>>>>
>> >>>>>XSL - XML Stylesheets is divided into two parts,
>> >>>>>XSL-T and XSL-FO.
>> >>>>>
>> >>>>>The T part deals with templates and translation.
>> >>>>>Since HTML is valid XML, I
>> >>>>>guess I can parse my HTML using XSL-T to produce
>> >>>
>> >>>XML
>> >>>
>> >>>>>and vice versa. I don't
>> >>>>>understand why XSL-T refers to "nodes in an
>> >>>
>> >>>output
>> >>>
>> >>>>>tree". This suggests some
>> >>>>>kind of internal representation, but XML is
>> >>>>>perfectly good representation
>> >>>>>language. Don't <templates> merely write XML
>> >>>
>> >>>text to
>> >>>
>> >>>>>stdout?
>> >>>>>
>> >>>>>No, the result tree is completely abstract,
>> >>>
>> >>>there is
>> >>>
>> >>>>>no suggestion of an
>> >>>>>internal representation. In fact, for many XSLT
>> >>>>>processors, the "result
>> >>>>>tree" is represented internally as a stream of
>> >>>>>events, not as a linked
>> >>>>>collection of objects in memory. This concept of
>> >>>>>writing a tree, rather than
>> >>>>>writing text, however is extremely important.
>> >>>>>Firstly, it defines a
>> >>>>>separation of the information content of an XML
>> >>>>>document from the accidental
>> >>>>>aspects of its lexical representation -
>> >>>
>> >>>something
>> >>>
>> >>>>>that is sadly missing from
>> >>>>>the XML spec itself. In turn, this gives you a
>> >>>
>> >>>basis
>> >>>
>> >>>>>for defining a concise
>> >>>>>set of operators that are in some sense
>> >>>
>> >>>complete,
>> >>>
>> >>>>>composable and exhibit
>> >>>>>closure. In practical terms, it gives you the
>> >>>>>ability to write a series of
>> >>>>>transformations - a pipeline - in which the
>> >>>>>expensive steps of serializing
>> >>>>>and parsing intermediate results can be
>> >>>
>> >>>eliminated.
>> >>>
>> >>>>>
>> >>>>>Roughly, the process seems to work like this:
>> >>>
>> >>>the T
>> >>>
>> >>>>>processor does a
>> >>>>>recursive descent of the source XML. At each
>> >>>
>> >>>node it
>> >>>
>> >>>>>evaluates the set of
>> >>>>>templates. Those templates which match the name
>> >>>
>> >>>of
>> >>>
>> >>>>>the "current" tag are
>> >>>>>processed, in some order. The template writes
>> >>>
>> >>>text,
>> >>>
>> >>>>>that's why it's called a
>> >>>>>"template. The recursive descent is continued
>> >>>
>> >>>with
>> >>>
>> >>>>>an <apply-templates> tag
>> >>>>>inside the template. This allows you to balance
>> >>>>>output.
>> >>>>>
>> >>>>>It doesn't have to do a recursive descent of the
>> >>>>>source XML: that's up to
>> >>>>>the application, though a recursive descent is
>> >>>
>> >>>the
>> >>>
>> >>>>>most common design
>> >>>>>pattern. And it definitely doesn't write text:
>> >>>>>people who create a mental
>> >>>>>model of writing text eventually get a rude
>> >>>>>awakening, usually when they
>> >>>>>first try to tackle grouping problems.
>> >>>>>
>> >>>>>If no matches are found, the T processor
>> >>>
>> >>>continues
>> >>>
>> >>>>>the descent.
>> >>>>>
>> >>>>>There is a <template> tag (I forget what) which
>> >>>
>> >>>will
>> >>>
>> >>>>>select arbitrary paths
>> >>>>>in the souce tree, and there are tags which
>> >>>
>> >>>iterate
>> >>>
>> >>>>>through the result.
>> >>>>>
>> >>>>>Again, it's best to think of the stylesheet as
>> >>>>>containing nodes
>> >>>>>(representing instructions) rather than tags.
>> >>>>>Consider
>> >>>>>
>> >>>>><xsl:element name="x"><xsl:value-of
>> >>>>>select="."/></xsl:element>
>> >>>>>
>> >>>>>There are three tags there, but four nodes, and
>> >>>
>> >>>only
>> >>>
>> >>>>>two instructions. The
>> >>>>>semantics of the language are described in terms
>> >>>
>> >>>of
>> >>>
>> >>>>>the two instructions,
>> >>>>>not the three tags.
>> >>>>>
>> >>>>> This will allow me to build up a result "tree"
>> >>>>>which is not a mirror image
>> >>>>>of the source, something I need to do if I'm
>> >>>>>rearranging sections of the
>> >>>>>input document. Rather than buffering
>> >>>
>> >>>intermediate
>> >>>
>> >>>>>structures, the T
>> >>>>>processor does multiple passes based on these
>> >>>
>> >>>tags,
>> >>>
>> >>>>>and creates the output
>> >>>>>on-the-fly. Cool.
>> >>>>>
>> >>>>> ... .
>> >>>>>
>> >>>>>I assume there is nothing stopping me from using
>> >>>>>XSL-T to transform my HTML
>> >>>>>to PDF, but it seems best to output XSL-FO then
>> >>>>>create a PDF using some kind
>> >>>>>of tool. What is that tool?
>> >>>>>
>> >>>>>It's an XSL-FO processor. Examples are FOP,
>> >>>
>> >>>RenderX,
>> >>>
>> >>>>>Antenna House.
>> >>>>>
>> >>>>>Are there FO plug-ins available for my browsers?
>> >>>
>> >>>>>
>> >>>>>No, people are by-and-large using (X)HTML/CSS
>> >>>
>> >>>for
>> >>>
>> >>>>>the browser, XSL-FO/PDF
>> >>>>>for the printed page.
>> >>>>>
>> >>>>>Does this technology work?
>> >>>>>
>> >>>>>Absolutely yes.
>> >>>>>
>> >>>>>Michael Kay
>> >>>>>http://www.saxonica.com/





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS