Re: [xml-dev] Convert an XML Schema validation task into a form thatis suitable for running on Graphics Processing Units (GPUs)?
Ultimately, yes. Initially, no.
I expect this is another case where there may an application and data sweetspot where GPU processing comes into its own.
There is a great difference in the problems that GPUs will be optimal for and those for which SIMD is the bee's knees. And their algorithms could not be more different.
For GPUs, you must have a large amount of essentially indentical types of data, in 1D or 2D or 3D, and you want algorithms that elimate branches and random and non-local lookup.
So instead of
if (a < 200) then a == a+1; else a == a - 1
For GPU you have
table adjustment = -1, +1;
int size = (int) a >= 200;
a = a + adjustment[ size ];
i.e. because for a GPU, both branches will be stepped through (one interpreted as NOP), even though they are mutually exclusive.
For a SiMD you get a chunk of consecutive data, however much will fit into the longest CPU data unit (eg 8 bytes) and
1. examine it in parallel to see if it has features
2. process it in parallel according to those features as much as possible
3. flip to non parallel processing for outliers
4. reset to the next useful spot.
So for GPU you might have:
int4 reg1 = get4ints(input[i]);
int4 reg2 = make4ints( 200 );
if (allZeroOrLess(reg1, reg2)) {
output[i] = decrementInt4(reg1)
i = i + 4;
}
else {
int a = input[i];
output[i++] = a > 200? a + 1, a -1;
a = input[i];
output[i++] = a > 200? a + 1, a -1;
a = input[i];
output[i++] = a > 200? a + 1, a -1;
a = input[i];
output[i++] = a > 200? a + 1, a -1;
}
Now it is true that both GPU and SIMD benefit from algorithms that fold decision making into tables. And they are cross pollenating. And they are both scewed to graphics processing not text.
But getting back to the sweet spot, the big reason why GPU processing may not take off is that it requires someone to pull their fingers out and do it, open source, which effectively means someone to pay for it. People should not assume that the people who pioneer a project wont need their babies to grow up and leave home (Stallman, Linus, Raymond notwithstanding.)
After all this time why is Xerces/Xalan pretty much unoptimized? 15 years ago, I pointed out that a trivial SSE2 optimization could speed up the UTF8 unpacking by 4 times ()
And we had lots of people making noise about how slow XML parsing was. So why didnt anyone try to optimise libxml2 better, then or since, as far as I can see (ie explicit use of intrinsics: happy to be wrong). No slight in libxml2 or its developers, btw.
Rick