OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] seduced by markup

I feel pretty odd defending a syntax I have self-righteously sneered at
since 1985, but here goes.

First, let me say that I can't explain everything, even to myself.  It
is weird to reopen my long-closed mind about this.

> Wow - Steve I love your analogy, but I think you may have fallen in love
> with it for its own sake, because it just doesn't apply to the DTD/RNC
> comparison, at least I don't see it.  Now I may be a high priest and out
> of touch, but I just don't think DTDs are as transparent or accessible
> as you seem to think they are.

It's an observation.  I've had to explain the syntax many times to many
diverse audiences.  It's downright perverse that they get it so readily.
 I can't explain it.  I can only makes guesses.

> Years ago, as a relative newcomer to these technologies, I looked at DTD
> and RNC, and RNC made immediate, intuitive sense to me in a way that
> DTDs did not (and probably never will, because I doubt I'll spend enough
> time looking at them to get used to the squonks).  Now that's a
> completely subjective assessment, but for example consider the alien
> words and tokens you have to learn to really get DTDs:
> You have to understand that % is a magic letter that means the following
> token doesn't really mean anything about the markup, but is a reference
> to something defined elsewhere.


By the same token, consider two scenarios: You're in a room with 4
experienced programmers and 4 domain experts.  Let's say they're experts
in healthcare, since that's so topical these days.  They know about
medicine and/or insurance, but not much about IT.  Now, everybody has to
work together and come to an important agreement.  You're the facilitator.

Scenario 1:  You put RNG syntax on the table.  The programmers grin
smugly.  They feel perfectly comfortable.  The healthcare experts
grimace and stir uneasily.

Scenario 2: You put DTD syntax on the table.  Everybody stirs uneasily.
 What's this bizarre nonsense?

As the facilitator responsible for eliciting information from everyone
in the room, I'll prefer Scenario 2 every time.  I have reason to do the
full explanation, and to expect everyone to listen, and to
(figuratively) rap the knuckles of anyone who doesn't pay attention.

To prefer Scenario 1 is to create a gross power imbalance right at the
outset.  The domain experts will be humiliated and reticent.  The
programmers will run roughshod over the proceedings, and speak only to
each other.  It's a recipe for failure.


So let's assume Scenario 2.

The whole question of parameter entities (your % character) doesn't come
up for quite a while.  It comes up when it's needed, and when it's
needed, it's the answer to a question that those present in the room
have already asked.  It's much easier to understand the answer to a
question that you've already asked, especially if the whole field is

Aside: What always surprises me is that people generally absorb
Backus-Naur Form, right at the beginning, and without much difficulty.
Maybe BNF should be taught in preschool.  I think it might be a good idea.

>  #IMPLIED (what does that mean? it seems to indicate there is a default
> value, but what is it?

Hah!  I love this one.  It comes up immediately, and that's a good
thing, because it's the moment when the most important indoctrination
occurs.  The indoctrination begins with the programmer-deflating news
that we're emphatically not designing any sort of software whatsoever.
This news comes as a relief to non-programmers.  The programmers, on the
other hand, immediately wonder whether they're in the right room.  So
the facilitator tells them, confidentially, that they are in exactly the
right room to protect their turf, and #IMPLIED is their friend.  It is a
barrier that protects them from the busybody OCD standards-makers who
may be lurking in the process.  That thought generally restores their
(bemused and grumpy) attention, at least for a while.

The explanation of #IMPLIED is: if the interchanged information does not
provide an explicit value, then the value, if any, will be supplied by
the receiving system, at its sole discretion.  That strange idea takes a
while to be absorbed.  The absorption process is vital; it enormously
clarifies the nature of the entire task in which we're engaged.  We're
only legislating what must be legislated in order to enable information
interchange for a (probably unknown) range of "applications".  The full
abstract import of the word "application" is worth exploring at this
point.  It doesn't mean "software" so much as it means "the goals we're
trying to achieve in a given context" -- goals which require accurate
information interchange between diverse entities in different contexts
with different goals.

The indoctrination is complete when everybody can say to themselves the
phrase "#IMPLIED by the application" while understanding exactly what
that means.  The consultants/enablers should congratulate themselves
when this finally happens, because a therapeutic process has begun.
Programmers and non-programmers alike feel increasingly empowered by the
formalization of ambiguity.  It makes for good feeling in the room, and
thoughts that maybe we can all get along with each other.  This might
work after all.

> NMTOKEN (hunh?)

The facilitator should wait till a question arises to which it is the
answer.  That's the time to introduce it.

> CDATA (wha?)

Not sure why this one is a problem.  I've never seen it cause any problems.

> PCDATA (seems to have something to with CDATA, but is it different in
> some subtle and important way?)

Personally, by default I try to keep #PCDATA in the realm of <!ELEMENT
and CDATA in the realm of <!ATTLIST, thus not allowing the confusion to
arise.  As I write this, I'm just realizing now that the clunky
separation of the two realms has been helpful to me in my practice in
exactly this way.

Again, though, a question may arise to which the CDATA content model is
the answer.

> special meaning of ID/IDREF, but unfamiliar semantics (IDs can't be
> numbers! that will surprise a lot of people)

This has never caused a problem for me.  Nobody has ever expressed
surprise that rules must govern the construction of SGMLNAMES.  They
don't always like the rules.  So what?

(Aside: As a matter of common practice, the world's largest SGML
application, HTML, has long since de-facto vacated the restriction to
which you refer.  Even more shockingly, at least for me, Larry Masinter
wrote in 1999 that it's common practice to allow spaces in fragment
identifiers, which after all are the values of SGML ID attributes.
Finally, I learned only last week that %-hex-escaped spaces are OK in
fragment identifiers in the context of Mozilla, but IE won't work unless
they are left unescaped.  Go figure.)

> What is that PUBLIC identifier in the DOCTYPE declaration anyway?

You're in grave danger of pushing my rant-button about the W3C's
deliberate destruction of the distinction between PUBLIC-ly registered
materials and the 100% system-dependent, rental-contract-driven
identification of materials via URIs that resolve to
only-the-current-lessee-knows-what-(maybe).  It's not exactly a recipe
for information longevity and reliability.

Anyway, I think the whole PUBLIC identifier thing is irrelevant to a
project until it comes up, and when it does come up, as before,
everybody has already asked the question, and then there's a good answer
that really works.

> Contrived sure, and not all of these things are *necessary* in DTDs, but
> typically one does encounter them.  Sure Relax has its own syntax to
> learn, but at its simplest, the RNG document structure *mirrors the
> structure of the document it models*, which to me is the thing that
> really makes it accessible and intuitive.  I also like the relative lack
> of obscure words. And I have to say that most of the people that I work
> with have reacted in a similar way.

Look, I say again: I love RNG.  I just don't think it's any better than
DTD syntax in the real world, IF the purpose of both syntaxes is to
facilitate the development of toothy agreements between diverse parties
whose primary expertise is not necessarily IT.  And DTD syntax remains
much more commonplace.  History matters, whether we like it or not.
What are the facilitator's motivations?  Those matter, too.

> I like the idea that redundancy provides humans extra cues, but I'm just
> not sure DTD format in itself actually provides that.  Comments in human
> languages are super helpful, as usual, but of course you can have
> comments in any schema format.

Since you mention this, allow me to lament, once again, that W3C
deliberately crippled the DTD syntax in XML.  It disallowed comments
within markup declarations.  That wasn't necessary and it was extremely
hurtful to consensus-seeking projects.  It is hard to draw any
conclusion other than that they wanted to snatch document design out of
the hands of everyman, offering instead only slavery to the software
products those with the financial resources required to market them,
and, of course, to XSD, which everybody knew would be SO much better.

(As to the latter point, I have never understood why the W3C was allowed
to escape prosecution under the Sherman Antitrust Act.  I know the
theory: that TimBL has absolute authority, and therefore W3C is
technically not a conspiracy in restraint of trade.  But that doesn't
change the fact that it is indeed a conspiracy that has allowed multiple
market leaders to collude in secret sessions with the net result of
restraining trade, once TimBL says "OK".)

Moreover, all that happened before RNG was around.

W3C also crippled DTD syntax in another important way, once again
unnecessarily, by disallowing parameter entities in DTDs that are
included directly, verbatim, in XML documents.  They're still OK in
remote DTDs.  They're just not OK where people can exploit their
expressive power more locally, and with zero red tape.  So once again
it's a matter of big money trumping little money, right here in the most
economically sensitive area of human intercourse, namely, information
interchange.  It's easy to understand why the Establishment saw fit to
endow TimBL with a knighthood.  A badge of honor, right?  His integrity
took the hit, the Establishment reaped the reward, and the public was
duly impressed.  No losers, right?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS