OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Historical I18n Note

From: Tony Graham [mailto:Tony.Graham@ireland.sun.com]

>I guess I shouldn't have included that line about incantations, since
>my main point was that the level of support for arbitrary character
>sets among SGML parsers was mixed, to put it mildly.

Granted.  I don't accept statements about how difficult SGML was 
on their face.  Too many people were using it too successfully. 
The unevenness of implementation was a fact, however.  The Unicorn 
tests come to mind.

>Of course, there was neither the emphasis on nor the knowledge of
>multiple character sets when SGML was designed or when most of the
>SGML parsers were written.  

But the abstractions "in principle" reveal some foresight in design. 
Again, the Declaration is the ultimate escape hatch:  use wisely 
and with regard to costs.   CALS systems usually had to specify 
the Declaration in effect.  No one said it was simple but no one assumed 
a priori a single universal system.  I think it is that assumption 
by Berners-Lee et al that drives W3C design.  I think it is an 
optimistic assumption even if necessary.  But assuming we don't 
need that escape hatch is beyond optimistic and into foolhardy. 
The web has been lucky and successful to some extent.  I won't 
bet all future information system processing on that extent. 
Preserve options including the option to choose options.  
The alternative is more disturbing.

>> 1.  Should the XML SGML Declaration be real and be open 
>> to use by XML developers?  Do we go forward only by 

>No.  There's too much stuff that you would never change, because
>changing it would break XML interoperability.

Granted.  It is the worst case option, the lifeboat for when 
the ship sinks or really, if the W3C refuses to meet some 
set of requirements a different architectural group thinks 
necessary.  There may come a time or case when 
XML interoperability is not the primary requirement.  So 
this option is preserved for that case.  I refuse to 
recognize private closed group's hegemony over markup. 
That recognition would be stupid in the extreme.

 > 2.  Should some portion of that remain closed?

>You shouldn't use it.  Since I've never understood why the SGML
>Declaration isn't written in SGML, I think a hypothetical SGML
>Declaration equivalent for XML should be written in XML. 

It requires the reference concrete syntax.  

>I don't think you can convince many people of the need for a new SGML
>Declaration for XML, and I don't think that you could convince many of 
>those to use something that isn't itself XML.

Times change and so do requirements.  Today the alternative is
inside the file or to turn the names into syntax puree (relax the 
draconian parse).   Again, one might really want to use the standard 
as intended instead of how personally interpreted.   That is the 
Bad Thing About XML: privatization of public assets by consortia 
with a follow on distortion of the perception of the need for 
international standards.   We aren't doing ourselves 
or our heirs any favors with that policy or practice.   We can 
logically justify something based on current systems, 
but that won't make it right.

 >> 3.  Could some portion of it be used for requirements 
 >> such as Blueberry presents? 

>You might use some of the ideas that an SGML Declaration represents,
>but its syntax is appalling.

Please clarify:  the reference concrete syntax is appalling?  Why?

 >> Bryan states that the variant concrete syntax declarations 
 >> are the way to respond when a system not based on the International 
 >> Reference Version (IRV) character set defined in ISO 646 is used 
 >> thus requiing alterations to the SYNTAX clause of the SGML 
 >> Declaration.  Three ways are provided:
 >> 1.  in the SYNTAX clause of the SGML Declaration, a public 
 >> concrete syntax is specified (itself, a variant concrete syntax)  

>That just saves space in the SGML Declaration, since what you would
>put in the SYNTAX clause is now in an external file (or built into the 
>SGML parser).  Only the SYNTAX clause that would differ between XML
>1.0 and Blueberry, so you'd end up with separate SGML Declaration
>files that refer to separate syntax files.

>> 3.  Completely redefine the SYNTAX clause.  Bryan provides 
>> an example of an alternative syntax-reference character 
>> set description for EBCDIC that changes the reference 
>> concrete syntax.

>That's what you'd have to do.

It seems useful at the very least as the normative way to document the

 >> This makes use of public identifiers.  I am curious if a 
 >> URI based identifier might be used if a stable external 
 >> file format were provided such as you mention if formal 
 >> is set to NO in the features clause.

>The SGML Declaration has always identified things by name, not by
>location (where the ISO 2022 escape sequences in CHARSET identifiers
>are really just an alternative name, I suppose).  Also, identifiers in
>the SGML declaration are currently limited to "minimum literals",
>which is a different set of characters to those allowed in URLs.

That might be worth changing.  The URN is a name, so enabling 
it in the declaration should be viable.

 > Using a SYSTEM declaration we see something such as 
 > Martin Bryan's example:
 > SCOPE Instance <!-- indicates system can handle more than one syntax at a
 > time -->
 > SYNTAX PUBLIC "ISO 8879-1986//SYNTAX Reference//EN"
 >         CHANGES  DELIMLEN 3
 >         SEQUENCE YES
 >         SRCNT    100
 >         SRLEN    10

>If you wrote separate syntax clauses for XML 1.0 and Blueberry and
>gave them separate identifiers, then an XML processor that wanted to
>behave like a SGML parser could provide a System Declaration that
>stated which syntax clauses it supported.


>Over the years, people have proposed various schemes for documenting
>the capabilities of XML processors that have all reminded me of SGML's
>System Declaration, and indicating Blueberry support or lack of it is
>probably best left to such an XML mechanism because there's a lot of
>stuff in a System Declaration that will never change for XML and that
>is of absolutely no interest to someone checking on Blueberry support.

Again, it seems best to use the standard as intended rather than 
building in system-specific flags.  There will be no end of it.

 > I don't want to trivialize the difficulty.  On the other hand, 
 > I don't want to see a Blueberry pop up every two years and 
 > find out "oops, we need yet more of SGML or we need to 
 > reinvent SGML" or "those HAN characters just aren't business 
 > requirements so...".  

>Yes, you can describe post-Blueberry XML using a SGML Declaration
>(although you might need to fudge on &#x85;), but since there's so
>much stuff in a SGML Declaration that will never change for XML, I
>question why you'd want to add parsing SGML Declarations to all XML

>As John Cowan pointed out in a post a while ago, in SGML you can now
>refer to a SGML Declaration rather than having to include the SGML
>Declaration in the input stream the way that you used to.  (I haven't
>actually seen that implemented by any SGML parser, but nor have I
>looked very hard.)  If you really wanted to base post-Blueberry XML on
>a post-Blueberry SGML Declaration, then you could standardise the
>identifier for the post-Blueberry SGML Declaration and include the
>SGML Declaration reference in every post-Blueberry XML file (which
>would certainly be sufficient to stop XML 1.0 processors from using
>the file).  The post-Blueberry SGML Declaration could be assumed to be 
>built in to the XML processor (or obtainable by dereferencing the
>name, for systems that care to implement it that way).

Why not?  Do we change XML or change the requirement for the Blueberry 
support such that only Blueberry systems have to recognize Blueberry