Re: [xml-dev] nextml

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Norman Gray <norman@astro.gla.ac.uk>
To: liam@w3.org
Date: Thu, 9 Dec 2010 17:19:12 +0000

Liam, hello.

On 2010 Dec 9, at 16:49, Liam R E Quin wrote:

> On Thu, 2010-12-09 at 12:13 +0000, Norman Gray wrote:
> [...]
>> ...and later, Liam Quinn wrote:
> 
> Actually it was me (Liam Quinn is someone else)

Ah, apologies. (several someone elses, by the look of it)

>>> The most frequent change request I hear is to remove the strict syntax
>>> requirements and make every XML implementation include some sort of
>>> HTML-like expert system to do the parsing, automatically "correcting"
>>> errors like missing quotes off attribute values.
> 
> Please note, I'm *not* advocating such a change, but rather saying that
> it's the request I hear most often.

It didn't sound to me like you _were_ advocating it; sorry for not making that clearer.

> [...]
>> If 'XML-bis' were defined using lexer events, with strings defined as
>> sequences of unicode code points, then a JIS-encoded document with
>> missing quotes could be (required to be) handled by the lexer,
>> entirely transparently.  In other words, why is file/wire encoding
>> anything to do with XML?
> Because XML is about file interchange.
> 
> If your XML processor won't read my XML document, we've failed.

I think that separating out the lexing makes this easier, not harder.

I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints.  The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern.

The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them.

I'm not _necessarily_ advocating this as a vital ingredient, but it would surely short-circuit a certain amount of agonising about which UTF-* variants to accommodate, and separates parsing layers quite naturally.

[A more out-there position is to define XML in terms of a sequence of SAX events, or equivalent, but that obviously stops being a file-interchange standard]

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk

Follow-Ups:
- Re: [xml-dev] nextml
  - From: Norman Gray <norman@astro.gla.ac.uk>

References:
- nextml
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] nextml
  - From: Uche Ogbuji <uche@ogbuji.net>
- Re: [xml-dev] nextml
  - From: James Clark <jjc@jclark.com>
- Re: [xml-dev] nextml
  - From: Norman Gray <norman@astro.gla.ac.uk>
- Re: [xml-dev] nextml
  - From: Liam R E Quin <liam@w3.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]