XML referencing (was Re: Blueberry is not "closed")

Tim Bray further elucidated:


> On the other hand, if they do something like
> <?xml version="1.0" unicode="3.1.1" ?>
> Then every XML processor in the world will correctly reject it.
> Of course this second option is not viable if the W3C misguidedly
> lets NEL into the S production.


Okay, from Unicode 3.0, p. 48 (sec. 3.9, "Special Character Properties"), I
see that NEL is _not_ in the UNICODE list of line boundary control
characters. This seems odd. Why should NExt Line not be a line boundary
control? I guess I need to get back on the public Unicode list and ask.

(Too many lists. I'll just have to start using rules in my mailer.)

I assume Line Separator and Paragraph Separator _are_ recognized as line
boundary control by XML? No!?



Add a third parameter that references a character classification table
override table for the parser? (I hear the sound of stomache juices
sloshing.) No, let's try, instead of the second parameter being explicitly a
unicode reference parameter, it is allowed to reference a character
classification table override table, a few of which happen to be universally
known tables matching versions of unicode.

But can such a thing be done without breaking the SGML connection? And can
it be done cheaply enough? efficiently enough? Will it require features not
available in common OSses to implement? (The shared static library problem
_has_ been solved, hasn't it?)

Maybe NEL should be set aside. But (IMnsHO) Blueberry should allow any
UNICODE line boundary control as line separating characters. I wonder if
anyone at IBM has evaluated the possibility of slipping a pass into the
process of translating their EBCDIC documents that would map the NEL to an
XML entity or special tag followed by a line feed.

I get the feeling I'm repeating in monologue a discussion that has been had

