[
Lists Home |
Date Index |
Thread Index
]
On Tue, 21 Dec 2004 14:02:55 -0500, Roger L. Costello
<costello@mitre.org> wrote:
>
> Does this indicate that Approach #2 is superior? Is entropy a way to
> measure the value (quality) of Schemas? /Roger
I haven't seen anybody assert that entropy is a measure of quality.
It's one of those cases where the conventional meaning of a word
("disorder" in this case) carries an implicit value judgment that the
technical meaning does not.
In Kurt's post, and I think in Shannon's theory, entropy measures the
number of discrete states that a message could be in. When talking
about schemas, that's the number of valid instance *structures*
(forget the text content) that are valid according to the schema.
Some, such as a schema describing a row in a relational DB table, have
only one, hence the entropy i(the base 2 logarithm of 1) is zero.
Something like DocBook, XHTML,or WordML would have an enormous number
of valid instance structures (Kurt proposes a rule for counting only
the first recursive reference to a structure, so the number is not
infinite).
The point I took away from the post is that XSLT is (allegedly)
optimal for high entropy scenarios because of its recursive templating
mechanism, but it is overkill (for most of us ordinary mortals anyway)
for low-entropy scenarios. OK, I guess high entropy would be "low
quality" to someone with only SAX to work with ... or low entropy
would be "low value" to an XSLT geek for hire, :-)
So, my question is whether others agree with this? If so, it is a
cleaner way of pointing people in the direction of one XML tool or
another. "The measured entropy of your schema is on the order of 20,
so you probably want to use XSLT" sounds *so* much more credible to
the Pointy Haired Bosses of the world than "Well now, that there's
looks like a real document-ish type of schema in my humble opinion, so
if I was you, I'd use XSLT."
|