OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Well-formed Blueberry

At 3:25 PM +0100 7/13/01, Rob Lugt wrote:

>I can see a good reason for doing what you suggest, and I sympathise with
>your comments but the fact is that your proposal would turn a trivial
>implementation change into something much more difficult.  It could also
>have a performance impact, so is unlikely to be popular with Parser

Not necessarily. Most correct parsers and other APIs already have to check whether each character is legal in an XML name. Blueberry doesn't really change that. It changes the list, but it doesn't change the fact that parsers need to maintain and consult against very large tables of characters and code points. 

I can see a number of ways to efficiently implement my proposal without a great deal of effort. One is to maintain two tables, one for XML 1.0 legal characters and one for the extra characters in Blueberry. This is probably necessary anyway to allow parsers to handle both kinds of documents. 

A typical parser would first check if a character was legal according to XML 1.0. Only if that failed, would it then check to see if the character was a legal Blueberry character. This is quite natural. At least one API (JDOM) and probably others already carefully choose which characters are checked in which order to improve efficiency for the common characters vs. the uncommon characters.

In fact, to ease the handling of both kinnds of documents I'd expect there to be two separate method calls, one like isXMLNameCharacter() and one like isXMLBlueberryNameCharacter(). The second method would only be called if the first returned false. (This is hardly the only way to do it, but it is one possibility.)

Before parsing, the parser could set a boolean variable such as usesBlueberryCharacters to false. The isXMLBlueberryNameCharacter() could set this variable to true the first and every time it saw a blueberry character. Then the parsing was done, the parser would signal a well-formedness error if the variable was still false. 

Anyway, that's a very rough sketch, but you get the idea. The storage of the one extra boolean, and the setting of it each time a Blueberrry character is seen is trivial compared to the table lookup overhead that parsers do at this stage anyway. 

If I were revising JDOM to handle Blueberry (I pick JDOM just because its the only API whose internals I'm familiar with) setting up the tables for the new Blueberry characters would take as long or longer than implementing the scheme I just described. (JDOM isn't a parser but it does perform parser-like name checks.)

>Wouldn't a better solution be one of education and market forces?  Just like
>most people write backwards-compatible HTML today, most people will continue
>to write backwards-compatible XML tomorrow for the simple reason that they
>want it to be interoperable.

As somebody who spends most of my time educating people about XML and related technologies, I don't want to leave to education anything we can enforce in the code. I will most certainly warn people through my books and seminars not to mark their documents as Blueberry when they don't need to. But I still know I'll encounter masses of developers who've half-read the specs, and skimmed some half-accurate books or articles. Lord knows I've encouraged enough brain damage in my earlier books that I don't want to rely on books or any other form of education as being the sole solution to a potentially nasty problem. 

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |