xml-dev - Schema vs Schema-free (was: RE: [xml-dev] XML Binary Characterization WG

Schema vs Schema-free (was: RE: [xml-dev] XML Binary Characterization WG

[ Lists Home | Date Index | Thread Index ]

To: "'Elliotte Rusty Harold'" <elharo@metalab.unc.edu>, "'Stephen D. Williams'" <sdw@lig.net>
Subject: Schema vs Schema-free (was: RE: [xml-dev] XML Binary Characterization WG public list availabl)
From: "Bob Wyman" <bob@wyman.us>
Date: Tue, 13 Apr 2004 14:07:25 -0400
Cc: "'xml-dev'" <xml-dev@lists.xml.org>
Importance: Normal
In-reply-to: <p0601020dbca09bbf58b3@[192.168.254.88]>
Reply-to: <bob@wyman.us>

Elliote Rusty Harold wrote:
> this does rule out a number of the proposals that have 
> been made, particularly those that try to represent 
> numbers and similar things as binary rather than text,
> and thus lose the distinction between 1, 0001, and +1.
	Certainly, it is sometimes important to ensure that leading
zeros, gratuitous "+" signs, etc. are passed through systems
untouched. However, it is important to understand that the importance
varies depending on whether or not you are working with a schema.
	In a schema based system, if a field is defined as having
"integer" type and if the definition of integer states that "1,"
"00001", "+001", and "+1" are equivalent, then no one should feel
compelled to preserve any syntactic sugar unless there is some other
normative information that requires it to be preserved. Basically, the
author of the schema, in defining the schema and using type
definitions that establish an equivalence between representations, has
implicitly "authorized" the substitutions.
	On the other hand, in a schema-free system, there is no way
for a processor to "know" that something "is" a number. In the absence
of a schema, or some other normative document explicitly defines
equivalent representations of a value, a processor of the data should
faithfully pass the data, byte-for-byte.
	Thus, in a schema-free system, "1," "00001", "+001", and "+1"
must be considered as distinct, non-equivalent, and non-substitutable
values. On the other hand in a schema-based system, there is no
problem with substituting one form for another as long as the schema
permits it.
	Those who try to apply the rules of schema-free systems to
schema-based systems are doing something very, very dangerous. Because
"1" and "001" mean the same thing in a schema-based system, those who
argue that the precise original forms of value should be preserved are
probably under the false impression that there is some semantics
associated with those forms. Yet, unless the schema defines such
semantics, there is none and any assumptions concerning such
unspecified semantics are likely to be proved wrong -- potentially
with disastrous impacts.
	These disasters often happen when people get sloppy in their
interpretation of schemas. For instance a company might have a 5 digit
"employee" code that should have leading zeros. Some sloppy coder
might decide to map this to an integer... Of course, it is inevitable
that some down-stream code will look at an integer like "00001" and
convert it to "1"... (i.e. treat it like the schema says it can be
treated.) The results could be unpleasant. Of course, what the coder
*should have done* was recognize that the employee code is *NOT* an
integer, rather it is a five digit value that is composed only of
numeric characters and should be padded with leading "0" characters.
The field should be defined as such and appropriate types provided in
the schema.
	The distinction between these two environments is recognized
in the distinction between the ASN.1 based "Fast Schema" and "Fast
Infoset" systems. Fast Schema is only to be used when a schema is
present and, depending on the schema, will convert things that are
declared to be integers to binary forms with no guarantee of
preserving distinctions between things like "1", "001" and "+1". On
the other hand, "Fast Infoset" is for use in schema-free processing.
Thus, Fast Infoset (X.finfo) would preserve the exact forms of "1",
"001", and "+1". This is how it should be.
	Rules which apply to schema-based systems do not necessarily
apply to schema-free systems. Sometimes, but not always, "1," "001,"
and "+1" are really the same value.

		bob wyman

Follow-Ups:
- Re: Schema vs Schema-free (was: RE: [xml-dev] XML BinaryCharacterization WG public list availabl)
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

References:
- Re: [xml-dev] XML Binary Characterization WG public list availabl e
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

Prev by Date: RE: [xml-dev] The "what is XML" permathread revisited -- was Re: [xml-dev] XML Binary Characterization WG public list availabl e
Next by Date: Re: [xml-dev] XML Binary Characterization WG public list available
Previous by thread: Re: [xml-dev] XML Binary Characterization WG public list availabl e
Next by thread: Re: Schema vs Schema-free (was: RE: [xml-dev] XML BinaryCharacterization WG public list availabl)
Index(es):
- Date
- Thread