xml-dev - Re: [xml-dev] Exploiting multi-core CPUs during XML parsing

Re: [xml-dev] Exploiting multi-core CPUs during XML parsing

[ Lists Home | Date Index | Thread Index ]

To: Michael Kay <mike@saxonica.com>
Subject: Re: [xml-dev] Exploiting multi-core CPUs during XML parsing
From: Jeff Rafter <lists@jeffrafter.com>
Date: Sat, 01 Apr 2006 08:41:48 -0800
Cc: 'Sean McGrath' <sean.mcgrath@propylon.com>, xml-dev@lists.xml.org
Reply-to: lists@jeffrafter.com
User-agent: Thunderbird 1.5 (Windows/20051201)

> Sounds a nice idea independent of any speed benefits. I toyed with the idea
> of parsing XSLT match pattterns backwards at one stage, but didn't pursue
> it. My brain is wired for left-to-right reading...
> 
> As a matter of interest, is there any problem with decoding UTF-8 when
> reading backwards?
> 
> Shame that the spec doesn't require ">" to be escaped, I imagine this causes
> a fair problem with backtracking, for example how do you cope with
> 
> ...... <a><b/><c/></a>  -->
> 
> which might or might not be part of a comment?
> 
> IIRC we were able to read files backwards on ICL VME. That's history, but
> many of its features have reappeared in Windows 25 years later ...

I wrote a reverse parser in pascal once as part of an editor project. It 
was designed to determine the current context the user was working in. 
In general from a fixed space somewhere in the middle of a document the 
amount of branching and caching you had to do was excessive and in 
virtually all cases led to scans back to the start of the document. In 
Sean's case I imagine there would still be a lot of assumptions on the 
parser's part until it reached the crossover position between the two 
threads at which point it might issue a fatal error (if the assumptions 
turned out to be wrong).

I wonder if you could instead work on a sequential multi-threaded 
approach. 1 to handle decoding and chunking of characters another one to 
handle parsing of lexical structures and a third (possibly at a driver 
level) to handle external WF checks (like checking character classes, 
name checks, and duplicate attribute checks). Decoupling these pieces 
would allow you to very easily turn off WF checks if you knew that the 
document was WF via an out-of-band mechanism.


Cheers,
Jeff Rafter

Prev by Date: Re: [xml-dev] Unrequested unsubscription notifications?
Next by Date: Re: [xml-dev] Exploiting multi-core CPUs during XML parsing
Previous by thread: RE: Re: [xml-dev] Exploiting multi-core CPUs during XML parsing
Next by thread: Unrequested unsubscription notifications?
Index(es):
- Date
- Thread