OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] XML and entropy, again

[ Lists Home | Date Index | Thread Index ]

Shannon uses the term "entropy" as a measure of the amount of information.
The amount of information is proportional to the number of choices that the
sender has, or, likewise, the amount of uncertainty that the receiver has.
A high number of choices for the sender, and a high uncertainty for the
receiver means a high entropy (information). So, which of these approaches
has the greater entropy:

Approach #1

<Object>
    <Name>Roger L. Costello</Name>
    <HairColor>Red</HairColor>
    <SSN>123-45-6789</SSN>
    <Height>176 cm</Height>
    <Weight>74 kg</Weight>
</Object>

Approach #2

<Object>
    <hasA property="Name">Roger L. Costello</hasA>
    <hasA property="HairColor">Red</hasA>
    <hasA property="SSN">123-45-6789</hasA>
    <hasA property="Height">176 cm</hasA>
    <hasA property="Weight">74 kg</hasA>
</Object>

More accurately, which of these XML Schemas has the greater entropy:

Approach #1

<element name="Object">
    <complexType>
        <element name="Name" type="string"/>
        <element name="HairColor" type="string"/>
        <element name="SSN" type="string"/>
        <element name="Height" type="string"/>
        <element name="Weight" type="string"/>
    </complexType>
</element>

Approach #2

<element name="Object">
    <complexType>
        <element name="hasA">
            <complexType>
                <simpleContent base="string">
                    <attribute name="property" type="string"
use="required"/>
                </simpleContent>
            </complexType>
        </element>
    </complexType>
</element>

With Approach #1 there is virtually no choice for the sender - Object must
contain these properties: Name, HairColor, SSN, Height, and Weight.
Likewise, there is no uncertainty on the part of the receiver.  Thus, the
entropy (information) is low.

With Approach #2 there is no limit to the variety of properties that Object
can contain.  The sender has an unlimited choice of messages and the
receiver has a high uncertainty.  Thus, the entropy (information) is high.

Does this indicate that Approach #2 is superior?  Is entropy a way to
measure the value (quality) of Schemas? /Roger







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS