OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Request for Erratum to XML 1.0 and 1.1 Specs

[ Lists Home | Date Index | Thread Index ]

This is not an erratum. It is a change proposal.

Michael Kay

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au] 
> Sent: 21 October 2003 10:11
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Request for Erratum to XML 1.0 and 1.1 Specs 
> 
> 
> I have just sent this off to the XML Editor mail list. I encourage 
> anyone who thinks it
> is good or bad (or who just thinks there should be something 
> but doesn't 
> care what)
> to also send to them.
> 
> It also raises an interesting question: the XML spec is written in 
> draconian terms with,
> nominally, very few options. Yet SAX 2, the almost 
> universally deployed 
> parser
> interface, is highly parameterizable with features, handlers and 
> properties. So it
> cannot be too tragic to accept that some systems may need to bend 
> certain rules,
> without altering the basic definitions.
> 
> Rick
> 
> ===============================================================
> 
> Request for Erratum to XML 1.0 and 1.1 Specs
> ----------------------------------------------
> Rick Jelliffe, ricko@topologi.com, 2003-10-21
> 
> 
> I request the XML Working Group please consider the following 
> erratum to XML 1.0 which should also apply to XML 1.1.
> 
> The following two paragraphs, or something to the same 
> effect, should be 
> appended to section 5.1 "Validating and Non-Validating Processors"
> 
> 
> 
> "A non-validating processor may, at user option, imply 
> definitions for all the character entities defined by HTML 
> 4[1]. A document or entity 
> for which definitions are implied is not well-formed. The 
> processor must 
> report a non-fatal error. NOTE: The document is 'not well-formed but 
> processed'. Reliance on this feature by specifications is deprecated; 
> this option may be withdrawn at some
> future time should it prove dangerous."
> 
> "A non-validating processor which provides the HTML 4 
> definitions may, at user option, also imply definitions for 
> other Math ML and ISO standard sets[2]. A processor must 
> report a non-fatal error. The document is 'not well-formed 
> but processed'. NOTE: Reliance 
> on this feature by specifications is deprecated; this option may be 
> withdrawn at some future time should it prove dangerous."
> 
> [1] http://www.w3.org/TR/html401/sgml/entities.html
> [2] http://www.w3.org/TR/MathML2/chapter6.html#chars_entity-tables
> 
> 
> 
> This suggested erratum has the following characteristics:
> 
> 1) It does not require any change to any XML processor
> 2) It does not change the basic XML characteristic that the 
> only way to guarantee information is received at the other 
> end is to use a UTF-* encoding, no entities and no attribute 
> defaulting.
> 3) It maintains the current layering, ao no re-architecting
> or change in design is needed
> 4) It keeps the XML specification as the location on how to
> go from characters to data+markup.
> 
> 5) It does not make any existing valid XML document invalid
> 6) It does not make any existing invalid XML document valid
> 7) It does not make any existing WF document or entity non-WF
> 8) It does not make any existing non-WF document formally WF
> 
> 9) It does allow the continued non-validating processing of 
> documents which are non-WF only because they contain standard 
> references
> 10) It limits this to user option
> 11) It does not allow other specifications to use this as
> its default
> 12) It can be withdrawn
> 
> 13) I believe it is practical and would be simple to implement.
> 
> 
> 
> I believe the beneficiaries of such an erratum include:
> 
>  * Users typing in editors with no adequate input methods
>  for non-ASCII characters. I note that although Unicode
>  editors can display many characters, not all operating
>  systems have input methods to allow convenient data entry
>  even of Latin1 characters. (I believe this is better 
> provided  by using decent XML markup editors, without prejudice.)
> 
>  * XHTML users who are used to named references without 
> declarations  in HTML.
> 
>  * Potential XInclude users, who may wish
>  to treat a WF parsed entity from a document that uses
>  standard character references as a microdocument
> 
>  * Potential XML Schemas, Schematron and RELAX NG users who
>  may wish to upgrade from DTDs.
> 
>  * Potential XQuery users who are being hindered by the lack
>  of XML Schemas.
> 
>  * XML pipeline systems which can pass XML without requiring
>   tricky prologs
> 
>  * SOAP, RSS and RDF systems which must cope with data 
> fragments  from externally-generated document being embedded
> 
>  * Programmers serializing data to XML, especially for internal
>   systems, who may prefer to generate "—" or " "
>   rather than the numeric or literal equivalents.
> 
>  * Vendors who make products for the above
> 
>  * Low-sight or motion-impaired users whose speech synthesizers
>   or input methods only support ASCII characters. Aged, enraged
>   or diminished capacity users who may be frustrated at having
>   to lookup the number for something they know the name for.
>   (Though I do not want to suggest that "entity rage" is a hidden
>   problem.)
> 
> 
> I suggest its benefits over other suggested approaches include:
> 
>  * It does not require change to subsequent processes, as PSVI
>   processing would, nor any changes or additions to schema
>   specifications
> 
>  * It does not require pre-processing, as a macro processor would
> 
>  * It does not require the introdution and deployment of new
>   transcoders, as would Tim Bray and John Cowan's recent thought
>   experiment "UTF-8+Names"
> 
>  * It does not require interaction with other standards 
> groups, notably
>   XML Schemas EG or IANA or IETF.
> 
>  * By providing it at user option, it can succeed or fail; if 
> it is  popular and successful, that is good; if it is 
> unpopular or unsafe.
> 
>  * By limiting itself to the HTML and the MathML/ISO entities, it
>   avoids issues of user-defined entities, and the need to enumerate
>   the entities.
> 
>  * It does not define mappings for those characters, but defers to
>   HTML and MathML/ISO, who may provide standard mappings.
> 
> This gives a very wide constituency:
> 
> I note that Xerces' SAX 2 provide features by which a parser 
> can continue processing after an error. This proposal could 
> be seen as a very limit nod of recognition of that kind of practise.
> 
> 
> Cheers
> Rick Jelliffe
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS