Happy New Year everyone!
Except, of course, the paper does no such thing. It filters out uninteresting files, so that they don't need to be parsed in the first place. (It gives a pre- filter that uses SIMD parallel n-grams (2, 4, 8) similar to Bloom filters with various neat twiddles so that JSON documents that don't include some n-grams can be rapidly excluded from parsing. ) It does not speed up parsing at all, it just excludes more documents from parsing. (Isn't it bait-and-switch when you promise something but it turns out to be something diifferent?)
Anyway, of course, the technique is general and can be equally applied to (canonicalized or standalone) XML documents. But I wonder whether this adds some light to the problem of XML parsing speed, for situations where you are looking through lots of records: has the old answer of preprocessing files through grep (etc) to find candidates now respectable again?
Rick