OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
MicroASCII proposal

 One of the major complications in software is that there are simply too 
 many characters. Think of how many hours (and reputations!) are lossed 
 due to spelling errors, how many bugs due to typos, and the extra 
 parsing costs. We need to move XML (and computing) away from this 
 unfortunate legacy which are really just niche publishing "requirements" 
 and which made SGML ultimately fail.

 In order to do this, I am proposing MicroASCII. This would restore 
 ASCII to its Latin essentials and reduce the insane repeats. Syntactical 
 sugar such as K, Y and Z are no-brainers of course: I doubt that anyone 
 will really miss them. But more recent fads such J, W and U are better 
 off treated as presentation forms and taken care of by another layer: 
 ASCII violates this basic separation of concerns. Indeed, the whole 
 lower-case is redundant.

 What about internationalization? Well, we often think that 
 internationalization requires *more* features than any one alphabet 
 could get away with, but it ain't necessarily so. Lets say we support 
 Hebrew and the other semitic languages, and use letters for digits. We 
 can then get rid of the hindu digits from ASCII too.

 We can learn from the world of computing too. In LISP S-expressions, 
 the parenthesis is all that is needed for grouping. So out goes {} and 
 []. We don't need the control characters either. With all this, we 
 should be able to get to 32 (2^5) characters: MicroASCII will have 1/8 
 the number of code points taken up by usual ASCII bytes and therefore be 
 8 times faster to parse and 8 times simpler to understand! This is 
 enough of a speed up that Moore's law can be restarted, at least for a 
 year or two.  Mobile phone keyboards will be simplified.

 The other advantage is that it frees up many code points in the byte 
 that can be used for other purposes, such as sending around strings of 
 nulls and nils, which the database community has a voracious appetite 
 for. We could dedicate the whole of the codespace 0x90 -0xFF to 
 different kinds of nulls and nils and NELs.

 If someone did want other characters, I suppose we could insert them 
 using a convenient URL, such as


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS