XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] nextml


On 2010 Dec 9, at 17:19, Norman Gray wrote:

> I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints. The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern.
> 
> The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them.

I can put this more compactly, I think.

  * The current XML spec is at present largely defined in terms of codepoints.

  * Thus it effectively bakes a UTF-8 to codepoint shim into the standard, even though it doesn't _really_ seem to need to do so.

  * The main location in the current XML standard where UTF-8 is mentioned repeatedly is in the discussion of entities (including, obviously, the document entity).  That seems ripe for simplification along the lines above, especially if there's talk of entities being simplified away.

Norman


-- 
Norman Gray  :  http://nxg.me.uk



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS