OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Attributes v Elements



5/17/01 12:12:30 PM, Adam Van Den Hoven <Adam.Hoven@bluezone.net> wrote:
>Lets say I an schema that represents a symantically markedup document. That
>is, we have something like:
>
><p> I like to listen to the <org id="CBC">Canadian Broadcast
>Corporation</org>. But I then again, I live in <loc id="can">Canada</loc>.
></p>
>
>
>Now this is a very meaningful paragraph. There is nothing in there that
>isn't information (you could argue that the symantic tags I use are bad but
>I don't design a lot of symantic schemas... yet). However, if we were to do
>the markup as you suggest, I would need:
>
><p><span>I like to listen to the</span><org id="CBC">Canadian Broadcast
>Corporation</org><span>. But I then again, I live in <span><loc
>id="can">Canada</loc><span>.<span></p>
>
>All of a sudden, the signal to noise ratio has gone up by an order of
>magnitude. The idea of something that is a span has no meaning in a
>document. It further suggests that the content of the span is something
>different from the content of the symantic tags. This is patently not true. 

Pet peeve of mine: you mean the signal-to-noise ratio has gone *down*; there's now more noise 
relative to the signal.  *High* signal-to-noise ratio is usually (except in cryptography) a Good 
Thing.

>The statement that there should be no mixed content elements is faulty, when
>you are refering to something that represents (for lack of a better work)
>discourse. If its something I say, or I read, then mixed content models are
>very appropriate. 

I use the term "narrative."  The people who say that there should be no mixed content are usually 
thinking in terms of object serialization; by definition the fields of an object are unordered, 
whereas mixed content is intimately tied to notions of order.  But "I like to listen to the" can't 
be stuffed into a slot; it's not a separate item.  Narrative text may contain substructure, but its 
meaning is specifically tied to an in-order traversal of the resulting tree.  It has an inherent 
linear (in the "humanities" rather than "sciences" sense) order.  For some reason, that seems to 
scare or offend some people.

Note that 1) you sometimes really do have to use something similar to the "span" trick if you need 
to represent parallel but overlapping hierarchies in narrative text (e.g. if you're marking up a 
historical document and you need to preserve details of its physical, as well as logical, 
structure); Rick Jelliffe has written a good deal about this and 2) in truly data-oriented, object-
serialization type applications, mixed content really is best avoided unless you have a field that 
includes an entire piece of narrative text (such as a "comments" field in a customer record) in 
which case you don't treat the sub-elements of the text as fields.  What you don't want is 
something like:

<price>
  <currency>USD</currency>
  12.50
</price>

where one sub-field is "enclosed" and the other is "loose" for no apparent reason.