OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] nextml

Liam, hello.

On 2010 Dec 9, at 16:49, Liam R E Quin wrote:

> On Thu, 2010-12-09 at 12:13 +0000, Norman Gray wrote:
> [...]
>> ...and later, Liam Quinn wrote:
> Actually it was me (Liam Quinn is someone else)

Ah, apologies. (several someone elses, by the look of it)

>>> The most frequent change request I hear is to remove the strict syntax
>>> requirements and make every XML implementation include some sort of
>>> HTML-like expert system to do the parsing, automatically "correcting"
>>> errors like missing quotes off attribute values.
> Please note, I'm *not* advocating such a change, but rather saying that
> it's the request I hear most often.

It didn't sound to me like you _were_ advocating it; sorry for not making that clearer.

> [...]
>> If 'XML-bis' were defined using lexer events, with strings defined as
>> sequences of unicode code points, then a JIS-encoded document with
>> missing quotes could be (required to be) handled by the lexer,
>> entirely transparently.  In other words, why is file/wire encoding
>> anything to do with XML?
> Because XML is about file interchange.
> If your XML processor won't read my XML document, we've failed.

I think that separating out the lexing makes this easier, not harder.

I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints.  The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern.

The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them.

I'm not _necessarily_ advocating this as a vital ingredient, but it would surely short-circuit a certain amount of agonising about which UTF-* variants to accommodate, and separates parsing layers quite naturally.

[A more out-there position is to define XML in terms of a sequence of SAX events, or equivalent, but that obviously stops being a file-interchange standard]

All the best,


Norman Gray  :  http://nxg.me.uk

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS