[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: ANN: a portable data component -- length
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Sat, 9 Apr 2011 18:40:42 -0400
Hi Folks,
This has been an excellent discussion. I have learned a lot. Below is my summary of the discussion.
SUMMARY
Several people expressed doubt about the usefulness of the "portable length data component." They recommended finding other, more useful "portable data components."
Let's examine their reasons. Recall the example that was provided:
<altitude>
<feet>12000</feet>
<meters>3657.6</meters>
</altitude>
Notice that the same value is expressed in different units.
ARGUMENTS AGAINST THIS DESIGN
1. Store only one value and convert as needed. That way it can't get out of sync. This is in accordance with Normal Form in database theory which says that information should never be duplicated.
2. During a workflow, if the values accidentally get out of synch, how would one determine which is the correct value?
3. All measurements have uncertainty. Suppose the altitude measurement has an uncertainty of +/- 50 feet. Then the value of <meters> shouldn't have to be precisely 3657.6
4. Suppose that the measured distance is 1/3 mile. What do you put down so that the equations come out exactly right?
5. It is important to have some indication as to which value was an actual measurement (there might be more than one) vs. which values were computed from the measurement(s), so one could distinguish between potential measurement errors and potential computational issues (roundoffs, etc.).
ARGUMENTS IN FAVOR OF THIS DESIGN
1. There are many real-world examples where the same value is expressed in different units:
- In the US the distance to the next town is shown in both miles and kilometers
- US cars show the speed in both miles-per-hour and kilometers-per-hour
- Shops in the UK show the price of loose fruit and vegetables in both price per imperial and price per metric weight
- Shoes and clothes sold in the UK show both UK and EU sizes
It is easier for the author of the information to compute and show it in different units than it is for typical shoppers/consumers to do the calculation.
2. The redundancy allows for cross-checking. Suppose a sign says that the distance to the next town is 10 miles and 300 kilometers:
<distance-to-next-town>
<miles>10</miles>
<kilometers>300</kilometers>
</distance-to-next-town>
We can validate the data to discover that they are not consistent. Suppose the actual distance is 10 miles. If there was just one sign, which says Distance: 300 kilometers then we would have no way of realizing that something is wrong.
GUIDELINES FOR WHEN TO EXPRESS THE SAME VALUE IN MULTIPLE DIFFERENT UNITS
Express a value in multiple different units when the cost of deriving/computing the value in different units by the recipients is greater than the cost of computing/including the value by the sender.
Example: Consider the issue of prices and currency conversion:
<price>
<USD>34.52</USD>
</price>
If the recipients of the XML want to convert the same price into pounds sterling, it's not a straightforward fixed conversion rate. It will depend upon multiple factors such as the date that the transaction took place and any conversion charges that apply.
<price>
<USD>34.52</USD>
<UKP>18.32</UKP>
</price>
If this is recording a transaction and the conversion rate is not discoverable by the recipients, then expressing the value in different units into the instance is imperative.
The transaction costs involved in discovering the equivalences are sometimes higher than the cost of carrying the data in the first place.
When deciding whether to replicate information, the determinant is this:
Provide the data in multiple different units when
the cost of replication is less than the cost of
computation when the data is re-referenced.
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]