[
Lists Home |
Date Index |
Thread Index
]
- To: Michael Kay <mike@saxonica.com>
- Subject: Re: [xml-dev] Another XML parsing idea? Was: Re: [xml-dev] XML Hangover)
- From: Bob Foster <bob@objfac.com>
- Date: Wed, 13 Jul 2005 14:18:03 -0700
- Cc: xml-dev@lists.xml.org
- In-reply-to: <20050713202928.F31912FFA4@loot.dreamhost.com>
- References: <20050713202928.F31912FFA4@loot.dreamhost.com>
- User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
I don't know the internals (maybe someone can comment) but I believe
Markup Technology has a protocol for passing PSVI around. It seems
pretty darn fast.
Bob Foster
Michael Kay wrote:
> A protocol implies sending and receiving messages typically across a
process
> boundary or even a machine boundary. This would raise the cost of XML
> parsing by a couple of orders of magnitude.
>
> Michael Kay
>
>
>>-----Original Message-----
>>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>>Sent: 13 July 2005 20:01
>>To: Michael Kay; 'Pete Cordell'; xml-dev@lists.xml.org
>>Subject: [xml-dev] Another XML parsing idea? Was: Re:
>>[xml-dev] XML Hangover)
>>
>>Today, we have a paradigm in XML parsing of using APIs
>>like SAX or DOM. I was thinking of another approach to
>>parse XML documents.
>>
>>Can we have a protocol (instead of API) that will talk
>>between a application and the XML parser? This shall
>>make using a XML parser interoperable to the calling
>>application.. We could achieve this "we could have a
>>Microsoft XML parser serving Java program's XML
>>parsing request.."
>>
>>Just now we have APIs like SAX and DOM and proprietary
>>Microsoft APIs.. Had we had some protocol similar to
>>HTTP, that talked between a application and parser, it
>>may help interoperability..
>>
>>Is this sensible thinking? Is this idea conceptually
>>similar to StAX or .NET XmlReader parsing approach?
>>
>>Regards,
>>Mukul
>>
>>--- Michael Kay <mike@saxonica.com> wrote:
>>
>>
>>>The URL got truncated
>>>
>>>
>>
>>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html
>>
>>>with ".html" at the end.
>>>
>>>Michael Kay
>>>
>>>
>>>>-----Original Message-----
>>>>From: Mukul Gandhi [mailto:mukul_gandhi@yahoo.com]
>>>
>>>>Sent: 13 July 2005 10:02
>>>>To: Michael Kay; 'Pete Cordell';
>>>
>>>xml-dev@lists.xml.org
>>>
>>>>Subject: RE: [xml-dev] XSL for non-XML input (Was:
>>>
>>>Re:
>>>
>>>>[xml-dev] XML Hangover)
>>>>
>>>>Hi Mike,
>>>> I get error
>>>>HTTP 404 - File not found
>>>>
>>>>--- Michael Kay <mike@saxonica.com> wrote:
>>>>
>>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.htm
>>
>>>>Regards,
>>>>Mukul
>>>>
>>>>
>>>>>
>>>>>Michael Kay
>>>>>
>>>>>
>>>>>Going further, observing the idea of using out
>>>
>>>of
>>>
>>>>>band data (e.g. schema) to
>>>>>provide extra information to complete 'binary
>>>
>>>XML',
>>>
>>>>>could XSL (with suitable
>>>>>front ends) work on say an ASN.1 encoded X.509
>>>>>certificate (and ASN.1
>>>>>message definition) and produce, say, a PDF
>>>
>>>output?
>>>
>>>>>
>>>>>Not that I have a need to do that right now!
>>>
>>>I'm
>>>
>>>>>just interested to know
>>>>>whether XSL can be used as a kind of universal
>>>
>>>data
>>>
>>>>>translator.
>>>>>
>>>>>Thanks,
>>>>>
>>>>>Pete.
>>>>>--
>>>>>=============================================
>>>>>Pete Cordell
>>>>>Tech-Know-Ware Ltd
>>>>>
>>>>
>>-----------------------------------------------------------------
>>
>>>>> for XML to C++ data
>>>
>>>binding
>>>
>>>>>visit
>>>>>
>>>>>http://www.tech-know-ware.com/lmx
>>>>> (or
>>>
>>>http://www.xml2cpp.com)
>>>
>>>>>=============================================
>>>>>
>>>>>
>>>>>----- Original Message -----
>>>>>From: Michael Kay <mailto:mike@saxonica.com>
>>>>>To: 'Joe Schaffner'
>>>
>>><mailto:schaffner.joe@gmail.com>
>>>
>>>>> ;
>>>>>xml-dev@lists.xml.org
>>>>>Sent: Monday, July 11, 2005 9:00 PM
>>>>>Subject: RE: [xml-dev] XML Hangover
>>>>>
>>>>>
>>>>>
>>>>>I've been reading the XML litterature. It's
>>>
>>>great.
>>>
>>>>>Just a few comments:
>>>>>
>>>>>Welcome on board. It's refreshing to get
>>>
>>>thoughtful
>>>
>>>>>comments from someone
>>>>>who's new to the game.
>>>>>
>>>>>XSL - XML Stylesheets is divided into two parts,
>>>>>XSL-T and XSL-FO.
>>>>>
>>>>>The T part deals with templates and translation.
>>>>>Since HTML is valid XML, I
>>>>>guess I can parse my HTML using XSL-T to produce
>>>
>>>XML
>>>
>>>>>and vice versa. I don't
>>>>>understand why XSL-T refers to "nodes in an
>>>
>>>output
>>>
>>>>>tree". This suggests some
>>>>>kind of internal representation, but XML is
>>>>>perfectly good representation
>>>>>language. Don't <templates> merely write XML
>>>
>>>text to
>>>
>>>>>stdout?
>>>>>
>>>>>No, the result tree is completely abstract,
>>>
>>>there is
>>>
>>>>>no suggestion of an
>>>>>internal representation. In fact, for many XSLT
>>>>>processors, the "result
>>>>>tree" is represented internally as a stream of
>>>>>events, not as a linked
>>>>>collection of objects in memory. This concept of
>>>>>writing a tree, rather than
>>>>>writing text, however is extremely important.
>>>>>Firstly, it defines a
>>>>>separation of the information content of an XML
>>>>>document from the accidental
>>>>>aspects of its lexical representation -
>>>
>>>something
>>>
>>>>>that is sadly missing from
>>>>>the XML spec itself. In turn, this gives you a
>>>
>>>basis
>>>
>>>>>for defining a concise
>>>>>set of operators that are in some sense
>>>
>>>complete,
>>>
>>>>>composable and exhibit
>>>>>closure. In practical terms, it gives you the
>>>>>ability to write a series of
>>>>>transformations - a pipeline - in which the
>>>>>expensive steps of serializing
>>>>>and parsing intermediate results can be
>>>
>>>eliminated.
>>>
>>>>>
>>>>>Roughly, the process seems to work like this:
>>>
>>>the T
>>>
>>>>>processor does a
>>>>>recursive descent of the source XML. At each
>>>
>>>node it
>>>
>>>>>evaluates the set of
>>>>>templates. Those templates which match the name
>>>
>>>of
>>>
>>>>>the "current" tag are
>>>>>processed, in some order. The template writes
>>>
>>>text,
>>>
>>>>>that's why it's called a
>>>>>"template. The recursive descent is continued
>>>
>>>with
>>>
>>>>>an <apply-templates> tag
>>>>>inside the template. This allows you to balance
>>>>>output.
>>>>>
>>>>>It doesn't have to do a recursive descent of the
>>>>>source XML: that's up to
>>>>>the application, though a recursive descent is
>>>
>>>the
>>>
>>>>>most common design
>>>>>pattern. And it definitely doesn't write text:
>>>>>people who create a mental
>>>>>model of writing text eventually get a rude
>>>>>awakening, usually when they
>>>>>first try to tackle grouping problems.
>>>>>
>>>>>If no matches are found, the T processor
>>>
>>>continues
>>>
>>>>>the descent.
>>>>>
>>>>>There is a <template> tag (I forget what) which
>>>
>>>will
>>>
>>>>>select arbitrary paths
>>>>>in the souce tree, and there are tags which
>>>
>>>iterate
>>>
>>>>>through the result.
>>>>>
>>>>>Again, it's best to think of the stylesheet as
>>>>>containing nodes
>>>>>(representing instructions) rather than tags.
>>>>>Consider
>>>>>
>>>>><xsl:element name="x"><xsl:value-of
>>>>>select="."/></xsl:element>
>>>>>
>>>>>There are three tags there, but four nodes, and
>>>
>>>only
>>>
>>>>>two instructions. The
>>>>>semantics of the language are described in terms
>>>
>>>of
>>>
>>>>>the two instructions,
>>>>>not the three tags.
>>>>>
>>>>> This will allow me to build up a result "tree"
>>>>>which is not a mirror image
>>>>>of the source, something I need to do if I'm
>>>>>rearranging sections of the
>>>>>input document. Rather than buffering
>>>
>>>intermediate
>>>
>>>>>structures, the T
>>>>>processor does multiple passes based on these
>>>
>>>tags,
>>>
>>>>>and creates the output
>>>>>on-the-fly. Cool.
>>>>>
>>>>> ... .
>>>>>
>>>>>I assume there is nothing stopping me from using
>>>>>XSL-T to transform my HTML
>>>>>to PDF, but it seems best to output XSL-FO then
>>>>>create a PDF using some kind
>>>>>of tool. What is that tool?
>>>>>
>>>>>It's an XSL-FO processor. Examples are FOP,
>>>
>>>RenderX,
>>>
>>>>>Antenna House.
>>>>>
>>>>>Are there FO plug-ins available for my browsers?
>>>
>>>>>
>>>>>No, people are by-and-large using (X)HTML/CSS
>>>
>>>for
>>>
>>>>>the browser, XSL-FO/PDF
>>>>>for the printed page.
>>>>>
>>>>>Does this technology work?
>>>>>
>>>>>Absolutely yes.
>>>>>
>>>>>Michael Kay
>>>>>http://www.saxonica.com/
|