xml-dev - Re: Improved writing -- who's going to pay for it?

Re: Improved writing -- who's going to pay for it?
[ Lists Home | Date Index | Thread Index ]
From: Ronald Bourret <rpbourret@rpbourret.com>
To: xml-dev@lists.xml.org
Date: Thu, 12 Oct 2000 13:23:30 -0700
Rick JELLIFFE wrote:

> So would people be happier with
>   * a much more comprehensive Primer

I agree with Miloslav on this one. The primer is a stop-gap and if you
fixed the spec, you wouldn't need it. I think a better idea would be to
mix the primer with the regular spec (see Examples, below).

>   * splitting the Structures draft into two or three parts that were
> more
>      self contained

Do you mean separate documents or just better chaptering? If you mean
chaptering, it would probably help, but isn't the major problem.

>   * a much terser algorithmic/logical treatment of the subject, less
>      comprehensible to Joe Database but smaller and more precise

Definitely the wrong direction. This would, on the other hand, be a very
useful thing to throw into an appendix, with a note that if the appendix
disagrees with the rest of the spec, the appendix is right.

>   * a rewrite of structures based on the concrete syntax rather than
>       having the abstract components first

This would certainly help the first-time reader. Another possibility is
grouping abstract components and concrete syntax together, so the user
can easily draw parallels between the two.

> 
> Knowing some specifics might be helpful.

Here are some things that I'd suggest:

Examples
========
I would suggest a very strong use of the words "for example" throughout.
I can only speak for myself here, but I learn almost entirely through
examples. The nice thing about them is that, when combined with general
statements (the theoretical part), they shine light on the problem in
two different ways, satisfying two different types of learners
(practical and theoretical). For example (see, I'm doing it now):

   In the database world, null data means data that simply
   isn't there. This is very different from a value of 0
   (for numbers) or zero length (for a string). For example,
   suppose you have data collected from a weather station.
   If the thermometer isn't working, a null value is stored
   in the database rather than a 0, which would mean
   something different altogether.

Furthermore, I find that I can almost always extrapolate theory from
example, but can also almost never go the other way -- probably a
genetic defect. If I have a theoretical statement to compare my
extrapolated theory against, I can then figure out what the author
means.

Except that they take up space and can be surprisingly difficult to
write, the only real objection I can think of to examples is that they
can constrain people's thinking. That is, people think only in terms of
the example and miss the broader picture stated by the theoretical bits.

For authors, this is somewhat of a problem, but at least you can be sure
that the parts they do use are conformant. For implementers, this is a
sometimes a very real problem, but probably less than the problems
caused by lack of understanding.

Dryness, Big Words, Run-on, Flow
================================
I realize that in some ways, the criticisms in this section are not
entirely fair. Although some of the problems I mention can be fixed
(such as run-on sentences), the formality of the spec is quite clearly
deliberate and represents the authors' best attempts to be clear.

Still, the writing style runs directly into the human factors I keep
talking about. In spite of the fact that most of us are paid to read
this stuff, we all reach our limits, and the pedantic wording of the
spec helps me quickly reach mine.

One other consideration is that many readers are not native English
speakers. For non-native readers, simple words and simple sentences
structures go a long way to promoting readability. For example, I can
read books in German quite easily and technical papers with not too much
work. The German tax code, which is much like the schema spec in flavor,
stopped me dead.

So, here goes.

Reading the schema spec is as dry as eating flour. (Sorry, I couldn't
help it :) Contributing to the problem are big words, run-on sentences,
and lack of flow. For example, the following sentence is a pretty
representative example. (I chose it because I thought I had a chance of
rewriting it.)

   Except for anonymous complex type definitions (those with
   no {name}), since type definitions (i.e. both simple and
   complex type definitions taken together) must be uniquely
   identified within an XML Schema, no complex type definition
   can have the same name as another simple or complex type
   definition.

This could be rewritten as:

   If a type definition has a name (that is, it has a <i>name</i>
   property), that name must be unique within a given XML Schema.
   Simple and complex type definitions share the same space for
   such names. That is, a simple type and a complex type cannot
   have the same name.

or even the following, although it risks unclarity with respect to
simple and complex types having the same name:

   No two type definitions (whether simple or complex) in an XML
   Schema can have the same name.

The worst sentences appear in sections 3 through 6, which is
unfortunately the bulk of the spec.

As to flow, readers expect that, when one paragraph follows another,
there will be a transition between the two. When this doesn't happen,
the effect is confusion -- you look at both paragraphs and try to figure
out how they're related.

The schema spec has lots of unrelated paragraphs that follow each other.
This generally happens when each paragraph gives a different rule, such
as in chapter 3, where property definitions are given in tables followed
by a set of rules.

One solution to this would be to rewrite the rules in flowing
paragraphs. An equally good solution might be simply to number the
paragraphs and title the list "Rules" -- this would set up the reader's
expectations for what is to come. An even better solution would be a
short paragraph generally discussing the properties and how they are
used, followed by a set of specific rules.

Layout and Notation
===================
The schema spec is extremely busy, especially when viewed online. This
makes it very difficult to read and severely interupts flow.

1) The first problem is way too many jumps. For example, almost every
use of the word "absent" jumps to its definition. This is hypertext
abuse. Define the term once at the top and stop linking to it.
Similarly, in chapter 3, the paragraphs explaining the rules for the
properties jump from property names (e.g. {name}) to their list in the
property table, which is directly above. Use jumps sparingly, when they
will provide real value.

2) I found the use of braces ({}) for property names to be jarring --
they interupted the visual flow of the writing. A better thing to do
might be to place the property name in italics and use it as an
adjective to the word "property". For example, change:

   An empty {substitution group exclusions} allows a declaration to be
   nominated as the {substitution group affiliation} of other element
   declarations having the same {type definition}

to:

   If the <i>substitution group exclusions</i> property is empty, a
   declaration can be nominated (used?) as in the <i>substitution group
   affiliation</i> property of other elements that have the same value
   of the <i>type definition</i> property.
    
Or eliminate property names altogether when the meaning is clear:

   If a declaration has no substitution group exclusions, it can be
   affiliated with other elements that have the same type definition.

(I've probably botched this, but you get the idea.)

3) The spec switches wildly between different fonts and colors. Again,
this is visually very jarring and (in my mind) doesn't add anything. As
a suggestion, use Times Roman for text, Helvetica or Times Roman for
headers, and Courier for XML examples. Use bold for headers. Use bold or
italics for property names. Skip everything else (especially color!),
except possibly using bold in very limited circumstances, such as the
words NOTE and Definition.

(Quite frankly, the most visually pleasing documentation I've ever seen
has invariably been in plain text format, where the author had a very
limited set of tools and had to think carefully about how those were
applied. As an example of this, compare the DTD in appendix E with the
tables in section 4.)

4) Tables are abused throughout, especially in chapter 4, where they are
so complex as to boggle the eye. Use tables when you need multiple
columns of matching information and, in this case, use borders between
cells. Forget them otherwise.

5) I find the XML notation used in section 4 to be difficult to read.
Would it be possible to use a DTD or BNF?

6) The numbered lists in section 4 appear to be inconsistent. Sometimes
they're numbered 1, 2, 3, sometimes 1.1, 1.2, 1.3, sometimes sublists
are numbered 1.1.1, other times just 1. Is there are reason for the
different numbering systems? If so, I couldn't figure it out.

> 
> Even knowing at what point you become confused might help: I know
> paragraph clarification is not Simon's preferred way, but it is
> not a waste of time.

I became confused the minute I entered section 3.

Anyway, that's enough for now.

-- Ron

-- 
Ronald Bourret
Programming, Writing, and Training
XML, Databases, and Schemas
http://www.rpbourret.com
Follow-Ups:
- Re: Improved writing -- who's going to pay for it?
  - From: Ronald Bourret <rpbourret@rpbourret.com>
References:
- RE: Improved writing -- who's going to pay for it?
  - From: Linda van den Brink <lvdbrink@baan.nl>
- Re: Improved writing -- who's going to pay for it?
  - From: Rick JELLIFFE <ricko@geotempo.com>
Prev by Date: Re: XML Schemas: Best Practices
Next by Date: Re: Improved writing -- who's going to pay for it?
Previous by thread: Re: Improved writing -- who's going to pay for it?
Next by thread: Re: Improved writing -- who's going to pay for it?
Index(es):
- Date
- Thread