xml-dev - Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from

Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
From: David Lyon <david.lyon@computergrid.net>
Date: Sat, 5 Mar 2005 04:02:59 +1100
In-reply-to: <a06020406be4e5e5cd906@[192.168.1.107]>
References: <032801c52058$628a0330$c970fea9@CMHNovannet> <200503042252.27543.david.lyon@computergrid.net> <a06020406be4e5e5cd906@[192.168.1.107]>
User-agent: KMail/1.7.1

Hi Steven,

> Your method of assigning a separate character for
> each "type" has been used in some programming
> languages, notably some BASIC variants, and as a
> variable-naming convention in general. It's not
> at all a crazy idea, but it runs into some known
> problems.

Everything we have is a compromise...

> First, it's not extensible. Even in your example,
> where you assign a different character for each
> of several currencies, it seems clear that you
> can't do business with very many countries before
> the system becomes unmanageable. 

I would refute that. In the majority of economies, 
businesses are mostly domestically powered. That is 
mainly oriented towards serving the local economy.

Of course, some countries export more than their
fair share, and that is good.

Take Japan or Malaysia as an example. Even though
they export a heck of a lot.. the people on the ground are
very familiar with the Yen and the Ringitt respectively.

A little bit of trade is done in dollars and Euros
and pounds... but not the majority.

> It may not be as 
> obvious, but most countries that use the Latin
> alphabet, also use the same character code for
> their currency symbol, even though the symbol
> printed differs (try finding a pound-sterling
> sign on an American keyboard). 

Excellent... that makes an even more compelling
case for using the $ as a currency field specifier..

> Types are not a handy closed set. Each of the 
> types you provide is very broad, and even in simple 
> business data finer distinctions are valuable, e.g. for
> validation.

Certainly...

> Your 'Formula&="F=M*A"' example shows this  --
> even in a simple business example, it would be
> very valuable to be able to consider formulas as
> a different type than strings. Imagine if Excel
> couldn't tell the difference, and so all your
> formulas got displayed rather than resolved? Come
> to think of it, Excel probably does have that
> problem for any string that begins with "=" --
> though at least, not too many people's names
> begin with '='....

Yes. Well Excel is a fine product. They do support
xml these days and have their own way to address
the problem you describe above which seems to
work quite well.

I believe I could show how they could reduce the
size of the xml spreadsheet files with my encoding
but I'm not exactly needing to be chasing more
paid work at the moment.

> Second, it's inefficient. A given attribute (or
> field, if you prefer) is always the same type,
> except in very unusual situations (situations not
> supported by much software). So why encode the
> type on every instance of the field? 

Efficiency is becoming such a vague concept in data 
processing these days.

Processors have just gone 64-bit.... and finally 32bit
CPUs have run out of memory space... (4GB)..

File sizes don't blow out with this markup compared
to xml 1.0, in fact they are reduced.

> Better to just encode the type for a given field once in 
> a schema (as is done in XML, SGML, and RDBs) and
> avoid redundancy and saying things twice.

But don't you find it annoying when you are reading a
book and all the time you have to flip to the glossary or
appendix to find out what they are talking about.

That's the point here. Read without having to flip to
another file to get the specification of the field that
you are reading. It's much simpler and much more
natural. Consequently... it's much more efficient.

> Third, it entangles parsing with later
> processing. Why should a parser have to know
> anything about types at all, as in what
> characters are allowed after the equal sign?  

But it is very efficient for the parser to validate the
data after the equals sign once it knows the data-type
that it is expecting.

> The parser stays simpler if you leave type-checking
> to the validator, while the validator can
> specialize and be more thorough (like XML schema,
> RelaxNG, and others).

Yes, the parser stays simple.. but the applications then
have to pick up all the slack..

Why not make the parsers more "intelligent" so that the
applications can have a more predictable data stream
to read from.

This is a very interesting area. If my fingers weren't so
tired, I would drag up many examples to show what I
mean.

>
> >The other important question or point you mention is readability. In my
> >world, the people reading the markup are the business analysts, IT support
> >staff or the business owners. They aren't highly trained and need
> > something very simple.
>
> I submit that these people (not to mention
> parsers) would have an easier time reading the
> syntax without all those type-characters: it's
> strictly less stuff to learn. 

But they "love" certification... :-)

> And they don't need
> them, because anyone of the sort your describe is
> going to know that a field called "quantity" is
> numeric, "description" is a string, "date" is a
> date, and so on. 

I don't think so. In business... when something goes wrong and the
pressure is on to fix something... the more information that you have
on the spot.. the easier it is to fix.

> They don't need to be reminded 
> (or distracted) every single time they see it (or
> write it). And if they do forget, it's the kind
> of error that a validator *can* catch, so the
> consequences of human error can generally be
> avoided.

After a while, the type characters become so familiar that
they fade a little from view and the data being represented
is more clearly in view than with traditional xml where the
tags seem to get in the way of the underlying data quite
a bit.

> Also, the enormous number of people who know HTML
> or other syntaxes fundamentally like XML would
> have to re-train/adjust for your syntax. I have
> no problem with that in principle -- but what is
> the big advantage that makes it worth their time?

1) Systems that are a bit more reliable and less complex.

2) Less complication in the parsers (but more intelligence)


> The biggest problem, I think, is posing Yet
> Another Almost XML Syntax. A new syntax requires
> new implementations. The cost of those
> implementations, and the inability to use
> countless existing implementations, surely exceed
> the advantages in this case. 

Yes well... another cost for business... they might even
need to spend some extra money to upgrade... :-)

but at the same time... they might get new functionality.

> A syntax must have 
> very significant advantages in order to justify a
> lot of new implementations. As far as I can see,
> your syntax has no *functional* advantages (have
> I missed something it can do that XML can't?). It
> may have aesthetic advantages (though I myself
> don't see them), but if that's all, I don't think
> it will be enough to justify all the extra effort.

That's what they said about the RS-232c serial port
when compared to USB.... :-)

Inherent ability to cope with data-typing is a significant
improvement and reduces a lot of ambiguity for receiving
systems.

Consider trying to process:

 <unknown_data>  field1="true" field2="454.56" field3="robin hood" 
</unknown_data>

So many "guesses" need to be made about what type of database
table needs to be used to load it.

 <unknown_data>  field1?=True field2$=454.56 field&="robin hood" 
</unknown_data>

To me that almost screams out an sql statement something like:

CREATE TABLE `unknown_data` (
  `field1` bool,
  `field2` decimal,
  `field3` varchar(70)',
);

I have everything there needed to do something useful to process
the data. btw, the length of field3 can easily be determined simply
by reading through the data and finding the maximum length.

So I've probably only touched on some of the concepts you've
raised, but that is as much as I can say now without my fingers falling
off or getting stuck to the keyboard :-)

Best Regards

David

-- 
Computergrid : The ones with the most connections win.

References:
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: "William J. Kammerer" <wkammerer@novannet.com>
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT is missing from the UCS ?
  - From: David Lyon <david.lyon@computergrid.net>
- Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from the UCS ?
  - From: "Steven J. DeRose" <sderose@acm.org>

Prev by Date: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from the UCS ?
Next by Date: Re: [xml-dev] Tamino status
Previous by thread: Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from the UCS ?
Next by thread: Do I need XML parser ?
Index(es):
- Date
- Thread