[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Please stop writing specifications that cannot be parsed/processed by software
- From: Marcus Reichardt <u123724@gmail.com>
- To: Michael Kay <mike@saxonica.com>
- Date: Mon, 5 Jun 2023 23:58:08 +0200
Thanks Michael K for bringing up test cases. When hearing about coding an XML parser in two days or similar stunts I was in disbelief considering the combinatorics amount to at least a low 3-figure number for core XML test cases alone. Also, is it really relevant that you can't use off-the-shelf LALR parser generators for a markup meta language that itself acts as parser generator? Markup is for ambitious end users not CS students (a distinction the modern web spectacularly also fails to observe considering everyone wants to do React and Tailwind), and SGML can be seen as a valuable contribution towards how powerful/idiosyncratic a mainstream document language can be designed considering its inventor is a lawyer by profession.
If we look at actual XML parsers in use today such as libxml2, those have been in development for well over two decades. Granted, with DTD (and XSD and RNG and XSLT and XPath and DOM and SAX and pull parsers) - but such is the XML stack after all. The specifics of constructing content model automata are identical for SGML and XML DTDs (and not much harder for XSD). A recent (2022?) change I remember introduces a heuristic for billion laughs attack mitigation, whereas an SGML declaration can control max nested entity expansion level from the start, along with other quantities.
With my post I wasn't suggesting to change XML; personally I think XML is almost perfect as a delivery or archive format, and indeed changing it at this point, if it were even possible, does more harm than good. For authoring (using markdown through SHORTREF and other SGML techniques), and embracing HTML, OTOH, I was hoping for a bit more support here. I mean, XML's alignment with SGML gives it precise and predictable integration of HTML and ubiquitous casual text editing conventions which is great and a big win for XML. Just as XML is set in stone, so is SGML, and it's unlikely we're going to see entirely new document languages. Hell, the majority of human-written content might have been already written. But rather than enjoying this power and the existence of something so outlandish (by today's standards) and nerdy as an SGML ISO standard, whenever the topic comes up, the reaction here is all defensive and frankly, sounding like early XML commercials narratives by business types ;)
Cheers,
Marcus Reichardt
sgml.io
> Am 05.06.2023 um 19:26 schrieb Michael Kay <mike@saxonica.com>:
>
>
>>
>>
>> I wrote the first complete and AFAIK fully conforming XML parser, Lark, in Nov/Dec 1996 (Yeah, XML wasn’t quite finished yet) and it took several weeks, which annoyed me because I really had thought we’d managed to narrow it down enough to make it a one-week task.
>
>
> Is that with or without DTD validation?
>
> I'd rate it at two days without DTD validation, 2 months with; and that's assuming you start with a decent test suite.
>
> Michael Kay
> Saxonica
>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]