[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
MicroASCII proposal
- From: rjelliffe <rjelliffe@allette.com.au>
- To: <xml-dev@lists.xml.org>
- Date: Thu, 13 Jan 2011 18:36:44 +1100
One of the major complications in software is that there are simply too
many characters. Think of how many hours (and reputations!) are lossed
due to spelling errors, how many bugs due to typos, and the extra
parsing costs. We need to move XML (and computing) away from this
unfortunate legacy which are really just niche publishing "requirements"
and which made SGML ultimately fail.
In order to do this, I am proposing MicroASCII. This would restore
ASCII to its Latin essentials and reduce the insane repeats. Syntactical
sugar such as K, Y and Z are no-brainers of course: I doubt that anyone
will really miss them. But more recent fads such J, W and U are better
off treated as presentation forms and taken care of by another layer:
ASCII violates this basic separation of concerns. Indeed, the whole
lower-case is redundant.
What about internationalization? Well, we often think that
internationalization requires *more* features than any one alphabet
could get away with, but it ain't necessarily so. Lets say we support
Hebrew and the other semitic languages, and use letters for digits. We
can then get rid of the hindu digits from ASCII too.
We can learn from the world of computing too. In LISP S-expressions,
the parenthesis is all that is needed for grouping. So out goes {} and
[]. We don't need the control characters either. With all this, we
should be able to get to 32 (2^5) characters: MicroASCII will have 1/8
the number of code points taken up by usual ASCII bytes and therefore be
8 times faster to parse and 8 times simpler to understand! This is
enough of a speed up that Moore's law can be restarted, at least for a
year or two. Mobile phone keyboards will be simplified.
The other advantage is that it frees up many code points in the byte
that can be used for other purposes, such as sending around strings of
nulls and nils, which the database community has a voracious appetite
for. We could dedicate the whole of the codespace 0x90 -0xFF to
different kinds of nulls and nils and NELs.
If someone did want other characters, I suppose we could insert them
using a convenient URL, such as
(-!http://www.unicode.org/tables/Unicode5.0/ampersand!-)
Cheers
RIC IELLIFFE
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]