[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
- From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- To: Marcus Reichardt <u123724@gmail.com>
- Date: Mon, 05 Jun 2023 09:41:27 -0600
Marcus Reichardt <u123724@gmail.com> writes:
> FYI item 4 in the list of goals of XML is
> "It shall be easy to write programs which process XML documents."
>
> What is meant by "easy to write programs processing XML documents"? To
> implement an XML parser from scratch?
Among other things, yes, ease of writing parsers was on the minds of
those who defined XML.
> In that case, I guess it's relatively safe to say this is neither very
> easy nor relevant since, fortunately you might say, nobody is creating
> XML parsers from scratch.
I disagree on both counts. Of course, like many things, the task of
writing an XML parser may prove more complicated and to involve more
subtleties than it looked like at first. But in many programming
languages, the hardest part of writing an XML parser is supporting
Unicode properly, which is worth doing anyway.
> But when using a parser library anyway, then processing XML
> is exactly as complicated as processing SGML since the parser lib does
> the heavy lifting, and emits just the same SAX events in both cases.
I think this is empirically false. In 1988, if I remember correctly, I
heard a well known computer scientist explain why his project used an
SGML-like syntax and not SGML. If you hand Kernighan and Ritche to a
graduate student, he said, they transcribe the grammar and a couple of
days or hours later they have a parser for C. When you hand the SGML
specification to a graduate student, they transcribe the grammar and a
couple of hours later they have 179 (or some equally crazy number)
reduce/reduce conflicts. And a week later, they have wrestled it down
to 38 reduce/reduce conflicts. Still no parser. It's no wonder there
are so few SGML tools, he said. The spec goes out of its way to put
unnecessary barriers in the way.
In 1996, those working to define XML said, informally, that a reasonably
competent graduate student should be able to write a correct XML parser
in about a week. During the course of our work, we heard from an
undergraduate in Austria that in his case it had taken two weeks. If I
remember correctly, he said it took longer than he had hoped because his
implementation language had only spotty Unicode support. But possibly
that's a false memory.
Since markup minimization in SGML and what ISO 8879 calls the
'ambiguity' rule are so unlike any standard concepts in off-the-shelf
parsing tools, they require a good deal of special coding. I could
easily be missing something, but I am unaware of anyone who has
developed a conforming SGML parser using only standard off-the-shelf
parser generators. (Certainly I could not do so.) And in the first ten
years of SGML's being a standard, only a handful of conforming parsers
had been produced. I believe it's safe to say that XML had more
conforming parsers than SGML within ten weeks of being a W3C
Recommendation.
> From what I gather by eg [1], "easy to implement" comes from a hope
> that there could be more than a single implementation. Fortunately,
> that has been taken care of for SGML now ;)
Yes, those of us who used SGML at that time chafed under the scarcity of
SGML software, and we hoped that there would be many many more programs
for XML than there were for SGML.
"More than one" implementation was not the bar we set.
Michael
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]