xml-dev - Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from

Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from the UCS ?
From: "Steven J. DeRose" <sderose@acm.org>
Date: Fri, 4 Mar 2005 21:01:06 -0500
In-reply-to: <200503042252.27543.david.lyon@computergrid.net>
References: <032801c52058$628a0330$c970fea9@CMHNovannet><200503041246.09177.david.lyon@computergrid.net><200503041119.LAA07304@penguin.nag.co.uk><200503042252.27543.david.lyon@computergrid.net>

At 22:52 -0500 2005-03-04, David Lyon wrote:
>So I've been using the following representation to denote field types:
>
>  &         = String values
>  #          = numeric values (ie integers, numbers)
>  $/£/¥/¤ = currency values
>  ?          = boolean values
>  @         = date values
>
>The markup roughly becomes:
>
>  <tag>
>  element[field_type]=[data_value]
></tag>

Your method of assigning a separate character for 
each "type" has been used in some programming 
languages, notably some BASIC variants, and as a 
variable-naming convention in general. It's not 
at all a crazy idea, but it runs into some known 
problems.

First, it's not extensible. Even in your example, 
where you assign a different character for each 
of several currencies, it seems clear that you 
can't do business with very many countries before 
the system becomes unmanageable. It may not be as 
obvious, but most countries that use the Latin 
alphabet, also use the same character code for 
their currency symbol, even though the symbol 
printed differs (try finding a pound-sterling 
sign on an American keyboard). Types are not a 
handy closed set. Each of the types you provide 
is very broad, and even in simple business data 
finer distinctions are valuable, e.g. for 
validation.

Your 'Formula&="F=M*A"' example shows this  -- 
even in a simple business example, it would be 
very valuable to be able to consider formulas as 
a different type than strings. Imagine if Excel 
couldn't tell the difference, and so all your 
formulas got displayed rather than resolved? Come 
to think of it, Excel probably does have that 
problem for any string that begins with "=" -- 
though at least, not too many people's names 
begin with '='....

Second, it's inefficient. A given attribute (or 
field, if you prefer) is always the same type, 
except in very unusual situations (situations not 
supported by much software). So why encode the 
type on every instance of the field? Better to 
just encode the type for a given field once in a 
schema (as is done in XML, SGML, and RDBs) and 
avoid redundancy and saying things twice.

Third, it entangles parsing with later 
processing. Why should a parser have to know 
anything about types at all, as in what 
characters are allowed after the equal sign?  The 
parser stays simpler if you leave type-checking 
to the validator, while the validator can 
specialize and be more thorough (like XML schema, 
RelaxNG, and others).

>The other important question or point you mention is readability. In my
>world, the people reading the markup are the business analysts, IT support
>staff or the business owners. They aren't highly trained and need something
>very simple.

I submit that these people (not to mention 
parsers) would have an easier time reading the 
syntax without all those type-characters: it's 
strictly less stuff to learn. And they don't need 
them, because anyone of the sort your describe is 
going to know that a field called "quantity" is 
numeric, "description" is a string, "date" is a 
date, and so on. They don't need to be reminded 
(or distracted) every single time they see it (or 
write it). And if they do forget, it's the kind 
of error that a validator *can* catch, so the 
consequences of human error can generally be 
avoided.

Also, the enormous number of people who know HTML 
or other syntaxes fundamentally like XML would 
have to re-train/adjust for your syntax. I have 
no problem with that in principle -- but what is 
the big advantage that makes it worth their time?

The biggest problem, I think, is posing Yet 
Another Almost XML Syntax. A new syntax requires 
new implementations. The cost of those 
implementations, and the inability to use 
countless existing implementations, surely exceed 
the advantages in this case. A syntax must have 
very significant advantages in order to justify a 
lot of new implementations. As far as I can see, 
your syntax has no *functional* advantages (have 
I missed something it can do that XML can't?). It 
may have aesthetic advantages (though I myself 
don't see them), but if that's all, I don't think 
it will be enough to justify all the extra effort.

Steve DeRose

--
Luthien Consulting: Real solutions to hard information management problems
    Specializing in XML, schema design, XSLT, and project design/review/repair
Steven J. DeRose, Ph.D., sderose@acm.org

Follow-Ups:
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: David Lyon <david.lyon@computergrid.net>

References:
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: "William J. Kammerer" <wkammerer@novannet.com>
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: David Lyon <david.lyon@computergrid.net>
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: David Carlisle <davidc@nag.co.uk>
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: David Lyon <david.lyon@computergrid.net>

Prev by Date: Re: [xml-dev] Units of Measure
Next by Date: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
Previous by thread: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missingfrom the UCS ?
Next by thread: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
Index(es):
- Date
- Thread