[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Please stop writing specifications that cannot beparsed/processed by software
- From: Debbie Lapeyre <dalapeyre@mulberrytech.com>
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Date: Mon, 5 Jun 2023 12:26:27 -0400
As someone who was part of a 2-man team that wrote one of those
SGML parsers, SGML parsers were many MAN-YEARS of effort.
There is a reason that so few existed and only 1 1/2 of them
even claimed to be complete (ours was NOT).
The old story was, show a programmer the SGML spec and she
would say "oh, I use LEX and YAK and it should take a week."
After working for 3 or 4 weeks, she would complain that the
spec was ambiguous and give up. (Note I'm not saying that
the spec WAS ambiguous, let's not start that religious war,
just that it seemed so to many of us. I also have a distinct
memory of ripping out our working look-ahead after look-ahead
was forbidden.)
And my version of the Grad-student writes a parser story (ain't
time and distance wonderful) was that the XML spec had promised
that a 'reasonably competent graduate student' could write an
XML parser in 3 days. And, near the end fo the spec process, a
very bright grad student complained bitterly (and in jest) that
he COULDS NOT write one in 3 days, it had taken nearly 5.
We were all thrilled.
--Debbie
P.S. A big part of what the XML spec did was state the rules
in a known format in a way that programmers could understand.
(Oh, yeah, and throw out some of the garbage and keep most
of the good bits of SGML)
> On Jun 5, 2023, at 11:41 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>
>
> Marcus Reichardt <u123724@gmail.com> writes:
>
>> FYI item 4 in the list of goals of XML is
>> "It shall be easy to write programs which process XML documents."
>>
>> What is meant by "easy to write programs processing XML documents"? To
>> implement an XML parser from scratch?
>
> Among other things, yes, ease of writing parsers was on the minds of
> those who defined XML.
>
>> In that case, I guess it's relatively safe to say this is neither very
>> easy nor relevant since, fortunately you might say, nobody is creating
>> XML parsers from scratch.
>
> I disagree on both counts. Of course, like many things, the task of
> writing an XML parser may prove more complicated and to involve more
> subtleties than it looked like at first. But in many programming
> languages, the hardest part of writing an XML parser is supporting
> Unicode properly, which is worth doing anyway.
>
>> But when using a parser library anyway, then processing XML
>> is exactly as complicated as processing SGML since the parser lib does
>> the heavy lifting, and emits just the same SAX events in both cases.
>
> I think this is empirically false. In 1988, if I remember correctly, I
> heard a well known computer scientist explain why his project used an
> SGML-like syntax and not SGML. If you hand Kernighan and Ritche to a
> graduate student, he said, they transcribe the grammar and a couple of
> days or hours later they have a parser for C. When you hand the SGML
> specification to a graduate student, they transcribe the grammar and a
> couple of hours later they have 179 (or some equally crazy number)
> reduce/reduce conflicts. And a week later, they have wrestled it down
> to 38 reduce/reduce conflicts. Still no parser. It's no wonder there
> are so few SGML tools, he said. The spec goes out of its way to put
> unnecessary barriers in the way.
>
> In 1996, those working to define XML said, informally, that a reasonably
> competent graduate student should be able to write a correct XML parser
> in about a week. During the course of our work, we heard from an
> undergraduate in Austria that in his case it had taken two weeks. If I
> remember correctly, he said it took longer than he had hoped because his
> implementation language had only spotty Unicode support. But possibly
> that's a false memory.
>
> Since markup minimization in SGML and what ISO 8879 calls the
> 'ambiguity' rule are so unlike any standard concepts in off-the-shelf
> parsing tools, they require a good deal of special coding. I could
> easily be missing something, but I am unaware of anyone who has
> developed a conforming SGML parser using only standard off-the-shelf
> parser generators. (Certainly I could not do so.) And in the first ten
> years of SGML's being a standard, only a handful of conforming parsers
> had been produced. I believe it's safe to say that XML had more
> conforming parsers than SGML within ten weeks of being a W3C
> Recommendation.
>
>> From what I gather by eg [1], "easy to implement" comes from a hope
>> that there could be more than a single implementation. Fortunately,
>> that has been taken care of for SGML now ;)
>
> Yes, those of us who used SGML at that time chafed under the scarcity of
> SGML software, and we hoped that there would be many many more programs
> for XML than there were for SGML.
>
> "More than one" implementation was not the bar we set.
>
> Michael
>
>
> --
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
================================================================
Deborah A Lapeyre mailto:dalapeyre@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
Rockville, MD 20851 Phone: 301-315-9631 (USA)
----------------------------------------------------------------
Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
================================================================
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]