Paper with an order of magnitude speed increase for parsing JSON!

Happy New Year everyone!

Readers of this list may be amused by the paper at https://blog.acolyer.org/2018/08/20/filter-before-you-parse-faster-analytics-on-raw-data-with-sparser/ which promises to "deliver an order-of-magnitude speed-up" on JSON parsing! Gosh! Now JSON parsing speed is such a problem!

Except, of course, the paper does no such thing. It filters out uninteresting files, so that they don't need to be parsed in the first place. (It gives a pre- filter that uses SIMD parallel n-grams (2, 4, 8) similar to Bloom filters with various neat twiddles so that JSON documents that don't include some n-grams can be rapidly excluded from parsing. ) It does not speed up parsing at all, it just excludes more documents from parsing. (Isn't it bait-and-switch when you promise something but it turns out to be something diifferent?)

Anyway, of course, the technique is general and can be equally applied to (canonicalized or standalone) XML documents. But I wonder whether this adds some light to the problem of XML parsing speed, for situations where you are looking through lots of records: has the old answer of preprocessing files through grep (etc) to find candidates now respectable again?

Rick