[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] nextml
- From: Norman Gray <norman@astro.gla.ac.uk>
- To: liam@w3.org
- Date: Thu, 9 Dec 2010 17:19:12 +0000
Liam, hello.
On 2010 Dec 9, at 16:49, Liam R E Quin wrote:
> On Thu, 2010-12-09 at 12:13 +0000, Norman Gray wrote:
> [...]
>> ...and later, Liam Quinn wrote:
>
> Actually it was me (Liam Quinn is someone else)
Ah, apologies. (several someone elses, by the look of it)
>>> The most frequent change request I hear is to remove the strict syntax
>>> requirements and make every XML implementation include some sort of
>>> HTML-like expert system to do the parsing, automatically "correcting"
>>> errors like missing quotes off attribute values.
>
> Please note, I'm *not* advocating such a change, but rather saying that
> it's the request I hear most often.
It didn't sound to me like you _were_ advocating it; sorry for not making that clearer.
> [...]
>> If 'XML-bis' were defined using lexer events, with strings defined as
>> sequences of unicode code points, then a JIS-encoded document with
>> missing quotes could be (required to be) handled by the lexer,
>> entirely transparently. In other words, why is file/wire encoding
>> anything to do with XML?
> Because XML is about file interchange.
>
> If your XML processor won't read my XML document, we've failed.
I think that separating out the lexing makes this easier, not harder.
I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints. The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern.
The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them.
I'm not _necessarily_ advocating this as a vital ingredient, but it would surely short-circuit a certain amount of agonising about which UTF-* variants to accommodate, and separates parsing layers quite naturally.
[A more out-there position is to define XML in terms of a sequence of SAX events, or equivalent, but that obviously stops being a file-interchange standard]
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]