Hi Folks, Excellent paper on JSON at last week’s
Soft-Shake Conference in Geneva. (http://seriot.ch/parsing_json.html)
Below are some extracts from the paper. But first, a lesson learned: Simple is good but a simple, incomplete specification, such as the JSON specification, leads to security flaws,
lack of interoperability, crashes and denial of services. Sometimes simple specifications just mean hidden complexity. Out of over 30 JSON parsers, no two parsers parsed the same set of documents the same way. JSON is not the easy, idealized format as many do believe. Edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications
that have evolved over time and that left many details loosely specified or not specified at all. The conciseness of the grammar leaves many aspects undefined. I [the author of the paper] wrote a corpus of JSON test files and documented how selected JSON parsers chose to handle these files … There were no two parsers that exhibited the same behavior, which may
cause serious interoperability issues. JSON is not a data format you can rely on blindly. I've demonstrated this by showing that the standard definition is spread out over at least six different documents (section
1), that the latest and most complete document, RFC-7159, is imprecise and contradictory (section 2), and by crafting test files that out of over 30 parsers, no two parsers parsed the same set of documents
the same way (section 4). As a final word, I keep on wondering why "fragile" formats such as HTML, CSS and JSON, or "dangerous" languages such as PHP or _javascript_ became so immensely popular. This is probably because they are
easy to start with by tweaking contents in a text editor, because of too liberal parsers or interpreters, and seemingly simple specifications. But sometimes, simple specifications just mean hidden complexity. |