[
Lists Home |
Date Index |
Thread Index
]
This is not an erratum. It is a change proposal.
Michael Kay
> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@allette.com.au]
> Sent: 21 October 2003 10:11
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Request for Erratum to XML 1.0 and 1.1 Specs
>
>
> I have just sent this off to the XML Editor mail list. I encourage
> anyone who thinks it
> is good or bad (or who just thinks there should be something
> but doesn't
> care what)
> to also send to them.
>
> It also raises an interesting question: the XML spec is written in
> draconian terms with,
> nominally, very few options. Yet SAX 2, the almost
> universally deployed
> parser
> interface, is highly parameterizable with features, handlers and
> properties. So it
> cannot be too tragic to accept that some systems may need to bend
> certain rules,
> without altering the basic definitions.
>
> Rick
>
> ===============================================================
>
> Request for Erratum to XML 1.0 and 1.1 Specs
> ----------------------------------------------
> Rick Jelliffe, ricko@topologi.com, 2003-10-21
>
>
> I request the XML Working Group please consider the following
> erratum to XML 1.0 which should also apply to XML 1.1.
>
> The following two paragraphs, or something to the same
> effect, should be
> appended to section 5.1 "Validating and Non-Validating Processors"
>
>
>
> "A non-validating processor may, at user option, imply
> definitions for all the character entities defined by HTML
> 4[1]. A document or entity
> for which definitions are implied is not well-formed. The
> processor must
> report a non-fatal error. NOTE: The document is 'not well-formed but
> processed'. Reliance on this feature by specifications is deprecated;
> this option may be withdrawn at some
> future time should it prove dangerous."
>
> "A non-validating processor which provides the HTML 4
> definitions may, at user option, also imply definitions for
> other Math ML and ISO standard sets[2]. A processor must
> report a non-fatal error. The document is 'not well-formed
> but processed'. NOTE: Reliance
> on this feature by specifications is deprecated; this option may be
> withdrawn at some future time should it prove dangerous."
>
> [1] http://www.w3.org/TR/html401/sgml/entities.html
> [2] http://www.w3.org/TR/MathML2/chapter6.html#chars_entity-tables
>
>
>
> This suggested erratum has the following characteristics:
>
> 1) It does not require any change to any XML processor
> 2) It does not change the basic XML characteristic that the
> only way to guarantee information is received at the other
> end is to use a UTF-* encoding, no entities and no attribute
> defaulting.
> 3) It maintains the current layering, ao no re-architecting
> or change in design is needed
> 4) It keeps the XML specification as the location on how to
> go from characters to data+markup.
>
> 5) It does not make any existing valid XML document invalid
> 6) It does not make any existing invalid XML document valid
> 7) It does not make any existing WF document or entity non-WF
> 8) It does not make any existing non-WF document formally WF
>
> 9) It does allow the continued non-validating processing of
> documents which are non-WF only because they contain standard
> references
> 10) It limits this to user option
> 11) It does not allow other specifications to use this as
> its default
> 12) It can be withdrawn
>
> 13) I believe it is practical and would be simple to implement.
>
>
>
> I believe the beneficiaries of such an erratum include:
>
> * Users typing in editors with no adequate input methods
> for non-ASCII characters. I note that although Unicode
> editors can display many characters, not all operating
> systems have input methods to allow convenient data entry
> even of Latin1 characters. (I believe this is better
> provided by using decent XML markup editors, without prejudice.)
>
> * XHTML users who are used to named references without
> declarations in HTML.
>
> * Potential XInclude users, who may wish
> to treat a WF parsed entity from a document that uses
> standard character references as a microdocument
>
> * Potential XML Schemas, Schematron and RELAX NG users who
> may wish to upgrade from DTDs.
>
> * Potential XQuery users who are being hindered by the lack
> of XML Schemas.
>
> * XML pipeline systems which can pass XML without requiring
> tricky prologs
>
> * SOAP, RSS and RDF systems which must cope with data
> fragments from externally-generated document being embedded
>
> * Programmers serializing data to XML, especially for internal
> systems, who may prefer to generate "—" or " "
> rather than the numeric or literal equivalents.
>
> * Vendors who make products for the above
>
> * Low-sight or motion-impaired users whose speech synthesizers
> or input methods only support ASCII characters. Aged, enraged
> or diminished capacity users who may be frustrated at having
> to lookup the number for something they know the name for.
> (Though I do not want to suggest that "entity rage" is a hidden
> problem.)
>
>
> I suggest its benefits over other suggested approaches include:
>
> * It does not require change to subsequent processes, as PSVI
> processing would, nor any changes or additions to schema
> specifications
>
> * It does not require pre-processing, as a macro processor would
>
> * It does not require the introdution and deployment of new
> transcoders, as would Tim Bray and John Cowan's recent thought
> experiment "UTF-8+Names"
>
> * It does not require interaction with other standards
> groups, notably
> XML Schemas EG or IANA or IETF.
>
> * By providing it at user option, it can succeed or fail; if
> it is popular and successful, that is good; if it is
> unpopular or unsafe.
>
> * By limiting itself to the HTML and the MathML/ISO entities, it
> avoids issues of user-defined entities, and the need to enumerate
> the entities.
>
> * It does not define mappings for those characters, but defers to
> HTML and MathML/ISO, who may provide standard mappings.
>
> This gives a very wide constituency:
>
> I note that Xerces' SAX 2 provide features by which a parser
> can continue processing after an error. This proposal could
> be seen as a very limit nod of recognition of that kind of practise.
>
>
> Cheers
> Rick Jelliffe
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an initiative of OASIS
<http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
|