OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Penance for misspent attributes

[ Lists Home | Date Index | Thread Index ]

On Thu, 2002-05-16 at 14:34, Arjun Ray wrote:
> The litmus test is whether one thinks these are or should be equivalent:
>   1)  <foo bar="baz"/>
>   2)  <foo><bar>baz</bar></foo>
> I would say that the job so far, poor or not, has almost entirely been one
> of propounding the equivalence view. 

Yep - it's a good simple test.
> And, of course, Keeping Things Safe For Netploder.  Is there some taboo on
> mentioning this?

No - I just don't use it much.  Is there something specific in its
handling of attributes I've missed?
> Taking the minority view, I would say that it isn't.  That is, rather than
> trying to unify attributes and (sub)elements - especially those that wind
> up with the moral equivalent of (#PCDATA) content models - it may be more
> fruitful to keep them distinct. 

That's the conclusion I'm reaching, and strongly.  Suddenly I can
abolish a whole group of annoying problems - if I just stick to elements
for content. 
> | Separating markup from content - and putting attributes squarely in the
> | markup side - seems like one means of at least alleviating the headache.
> Well, that's how it all started (see eg, [1]).  My personal rule of thumb
> has always been "elements for analysis, attributes for annotation".  The
> key is the sense in which attributes are not directly "analytic".  In my
> own attempts to explain this to (computer-savvy?) people, I've often drawn
> a parallel with parsing theory, based on the similarities between content
> models and BNFs (extended regular grammars).

Quite reasonable.  (And the history is excellent to see.)

> Given a set of production rules, a successful parse yields a parse tree
> with nonterminals as nodes and terminals as leaves.  With one twist, the
> SGML/XML serialization of such a parse tree is obvious.  (The twist is in
> the treatment of what are *taken* to be terminals, in that programs such
> as Bison allow terminals of two kinds: variables instantiated by a lexer,
> and string constants.  The former actually correspond to #PCDATA elements
> with obvious expansions, the latter to text directly.)  

I think I get this, though I've not done much with Bison.
> The basic outcome is a complete partitioning of the data into a hierarchy
> of semantically meaningful categories.  Turning this around, a SGML/XML
> instance basically represents a *complete parsing* of its text content.
> That is, while the problem in parsing theory is to recognize input, the
> primary intent of generalized markup is to express the result of a prior
> process of recognition in the same formalism of parse trees.  Pushing the
> analogy further, where attributes make their appearance in the semantic
> processing of parse trees, markup-attributes are very similar to inherited
> (as opposed to synthesized) parse-attributes.  

That makes good sense.
> The basic lesson: Do not use attributes to *analyse* wholes into parts. 

Yes!  A very nice way to describe the distinction.
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS