[
Lists Home |
Date Index |
Thread Index
]
At 22:52 -0500 2005-03-04, David Lyon wrote:
>So I've been using the following representation to denote field types:
>
> & = String values
> # = numeric values (ie integers, numbers)
> $/£/¥/¤ = currency values
> ? = boolean values
> @ = date values
>
>The markup roughly becomes:
>
> <tag>
> element[field_type]=[data_value]
></tag>
Your method of assigning a separate character for
each "type" has been used in some programming
languages, notably some BASIC variants, and as a
variable-naming convention in general. It's not
at all a crazy idea, but it runs into some known
problems.
First, it's not extensible. Even in your example,
where you assign a different character for each
of several currencies, it seems clear that you
can't do business with very many countries before
the system becomes unmanageable. It may not be as
obvious, but most countries that use the Latin
alphabet, also use the same character code for
their currency symbol, even though the symbol
printed differs (try finding a pound-sterling
sign on an American keyboard). Types are not a
handy closed set. Each of the types you provide
is very broad, and even in simple business data
finer distinctions are valuable, e.g. for
validation.
Your 'Formula&="F=M*A"' example shows this --
even in a simple business example, it would be
very valuable to be able to consider formulas as
a different type than strings. Imagine if Excel
couldn't tell the difference, and so all your
formulas got displayed rather than resolved? Come
to think of it, Excel probably does have that
problem for any string that begins with "=" --
though at least, not too many people's names
begin with '='....
Second, it's inefficient. A given attribute (or
field, if you prefer) is always the same type,
except in very unusual situations (situations not
supported by much software). So why encode the
type on every instance of the field? Better to
just encode the type for a given field once in a
schema (as is done in XML, SGML, and RDBs) and
avoid redundancy and saying things twice.
Third, it entangles parsing with later
processing. Why should a parser have to know
anything about types at all, as in what
characters are allowed after the equal sign? The
parser stays simpler if you leave type-checking
to the validator, while the validator can
specialize and be more thorough (like XML schema,
RelaxNG, and others).
>The other important question or point you mention is readability. In my
>world, the people reading the markup are the business analysts, IT support
>staff or the business owners. They aren't highly trained and need something
>very simple.
I submit that these people (not to mention
parsers) would have an easier time reading the
syntax without all those type-characters: it's
strictly less stuff to learn. And they don't need
them, because anyone of the sort your describe is
going to know that a field called "quantity" is
numeric, "description" is a string, "date" is a
date, and so on. They don't need to be reminded
(or distracted) every single time they see it (or
write it). And if they do forget, it's the kind
of error that a validator *can* catch, so the
consequences of human error can generally be
avoided.
Also, the enormous number of people who know HTML
or other syntaxes fundamentally like XML would
have to re-train/adjust for your syntax. I have
no problem with that in principle -- but what is
the big advantage that makes it worth their time?
The biggest problem, I think, is posing Yet
Another Almost XML Syntax. A new syntax requires
new implementations. The cost of those
implementations, and the inability to use
countless existing implementations, surely exceed
the advantages in this case. A syntax must have
very significant advantages in order to justify a
lot of new implementations. As far as I can see,
your syntax has no *functional* advantages (have
I missed something it can do that XML can't?). It
may have aesthetic advantages (though I myself
don't see them), but if that's all, I don't think
it will be enough to justify all the extra effort.
Steve DeRose
--
Luthien Consulting: Real solutions to hard information management problems
Specializing in XML, schema design, XSLT, and project design/review/repair
Steven J. DeRose, Ph.D., sderose@acm.org
|