xml-dev - RE: Schema vs Schema-free (was: RE: [xml-dev] XML BinaryCharacterization

RE: Schema vs Schema-free (was: RE: [xml-dev] XML BinaryCharacterization

[ Lists Home | Date Index | Thread Index ]

To: <bob@wyman.us>
Subject: RE: Schema vs Schema-free (was: RE: [xml-dev] XML BinaryCharacterization WG public list availabl)
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Wed, 14 Apr 2004 08:46:48 -0400
Cc: "'xml-dev'" <xml-dev@lists.xml.org>
In-reply-to: <00d401c421a7$3bfba8a0$650aa8c0@BOBDEV>
References: <00d401c421a7$3bfba8a0$650aa8c0@BOBDEV>

At 6:32 PM -0400 4/13/04, Bob Wyman wrote:
>Elliotte Rusty Harold wrote:
>>  It does not depend on whether *you* are working
>>  with a schema. It depends on whether *everyone*
>>  is working with the *same* schema. The assumption
>>  that there is a single unique schema, which everyone
>>  agrees to and adheres to is a common fallacy in the XML world.
>
>	I see your point, in the abstract, however, I can't help
>thinking that we should be able to assume more uniformity than you
>suggest. For instance, if an XML document declares a namespace that is
>defined by a normative schema, I think that processors of the document
>should be able to assume that the rules defined by that schema
>actually apply to the document. Or, if someone creates a document that
>claims to conform to some specification that includes a normative
>schema, we should be able to assume that the rules of the schema apply
>in instances of the format.

Not necessarily. Practically, I routinely publish documents that 
reference the XHTML DTD and yet are not valid. So far. it hasn't 
bothered anybody. Occasionally I screw up well-formedness, and then 
the system notices and complains. Sometimes even a human notices and 
complains. Lack of validity doesn't seem to bother anyone though.

But let's look at a more common case. Are 001, +1, 1.0, and 1.0000 
the same? All having a schema that types these as xsd:float or 
xsd:decimal really says is that in one interpretation they are. That 
is, they have the same value space. But this is not the only possible 
value space. In other contexts the difference may be significant. For 
instance, some compilers might read 001 as an octal,  +1 as a decimal 
int literal, and 1.0 and 1.000 and doubles. In scientific publishing, 
1.0 and 1.0000 are understood to be +/-5 in the last decimal place so 
the second is much more-precise than the first. This can be very 
important.

String application of a schema as you seem to be advocating limits 
everyone to treating the data as the same type. However, we may have 
different needs and want to interpret it in different ways. I may use 
a Java BigDecimal where you use an IEEE-754 double, which could give 
us very different round-off issues, for instance. We can both conform 
to the same schema while still not accepting the same interpretaion 
of the data.

>	If we can't make these assumptions then it seems to me that
>XML document exchange would always require bi-lateral agreements
>between authors and consumers. The alternative would be unilateral
>decisions made by both producers and consumers that could result in
>widely variant interpretations of the data -- i.e. the value of any
>particular XML document would be akin to the tea leaves in a fortune
>teller's cup -- any relationship between reality and what is read
>would be anecdotal at best. This might work well with poetry, but I
>think that such interpretational freedom is probably inappropriate in
>many areas of XML usage.

How the interpretations vary depends on what different parties want 
to do with the data. If we're both doing pretty much the same thing, 
we'll have similar interpretations. If we're doing very different 
things, we may have very different interpretations. For instance, if 
I'm writing amazon's accounting system, I might read the price of a 
book in an XML document they publish as a money type with exact 
decimal arithmetic. If I'm a publisher checking on the average price 
of my books at several bookstores, I may want to use a double instead 
for ease of arithmetic, and because I don't need to be precise to the 
penny. And if I'm an author republishing the amazon information about 
my books in real time on my web site, I may want to treat it as a 
string that's just copied from one place to another. The 
interpretation of the data is always a function of the local 
environment and understanding. Schemas can be descriptive of one 
local understanding, but they cannot force everyone to agree to the 
same interpretation of the same data. The simple fact is different 
people and organizations do have different needs that require 
different understandings of the same data.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

References:
- RE: Schema vs Schema-free (was: RE: [xml-dev] XML Binary Characterization WG public list availabl)
  - From: "Bob Wyman" <bob@wyman.us>

Prev by Date: Re: [xml-dev] Fast text output from SAX?
Next by Date: Re: [xml-dev] RE: Schema vs Schema-free (was: RE: [xml-dev] XML Binary Characterization WG public list availabl)
Previous by thread: RE: [xml-dev] RE: Schema vs Schema-free (was: RE: [xml-dev] XMLBinary Characterization WG public list availabl)
Next by thread: Toward an controlled vocabulary for "XML" and related terms? - was Re: [xml-dev] XML Binary Characterization WG public list availabl e
Index(es):
- Date
- Thread