[
Lists Home |
Date Index |
Thread Index
]
On Thu, 2002-05-16 at 14:34, Arjun Ray wrote:
> The litmus test is whether one thinks these are or should be equivalent:
>
> 1) <foo bar="baz"/>
> 2) <foo><bar>baz</bar></foo>
>
> I would say that the job so far, poor or not, has almost entirely been one
> of propounding the equivalence view.
Yep - it's a good simple test.
> And, of course, Keeping Things Safe For Netploder. Is there some taboo on
> mentioning this?
No - I just don't use it much. Is there something specific in its
handling of attributes I've missed?
> Taking the minority view, I would say that it isn't. That is, rather than
> trying to unify attributes and (sub)elements - especially those that wind
> up with the moral equivalent of (#PCDATA) content models - it may be more
> fruitful to keep them distinct.
That's the conclusion I'm reaching, and strongly. Suddenly I can
abolish a whole group of annoying problems - if I just stick to elements
for content.
> | Separating markup from content - and putting attributes squarely in the
> | markup side - seems like one means of at least alleviating the headache.
>
> Well, that's how it all started (see eg, [1]). My personal rule of thumb
> has always been "elements for analysis, attributes for annotation". The
> key is the sense in which attributes are not directly "analytic". In my
> own attempts to explain this to (computer-savvy?) people, I've often drawn
> a parallel with parsing theory, based on the similarities between content
> models and BNFs (extended regular grammars).
Quite reasonable. (And the history is excellent to see.)
> Given a set of production rules, a successful parse yields a parse tree
> with nonterminals as nodes and terminals as leaves. With one twist, the
> SGML/XML serialization of such a parse tree is obvious. (The twist is in
> the treatment of what are *taken* to be terminals, in that programs such
> as Bison allow terminals of two kinds: variables instantiated by a lexer,
> and string constants. The former actually correspond to #PCDATA elements
> with obvious expansions, the latter to text directly.)
I think I get this, though I've not done much with Bison.
> The basic outcome is a complete partitioning of the data into a hierarchy
> of semantically meaningful categories. Turning this around, a SGML/XML
> instance basically represents a *complete parsing* of its text content.
> That is, while the problem in parsing theory is to recognize input, the
> primary intent of generalized markup is to express the result of a prior
> process of recognition in the same formalism of parse trees. Pushing the
> analogy further, where attributes make their appearance in the semantic
> processing of parse trees, markup-attributes are very similar to inherited
> (as opposed to synthesized) parse-attributes.
That makes good sense.
> The basic lesson: Do not use attributes to *analyse* wholes into parts.
Yes! A very nice way to describe the distinction.
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com
|