Re: [xml-dev] xml over http

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] xml over http - RFC 3023

From: "Andrew Welch" <andrew.j.welch@gmail.com>
To: "Rick Jelliffe" <rjelliffe@allette.com.au>
Date: Mon, 1 Dec 2008 12:12:19 +0000

> For application/xml you ignore the first step and go straight to the
> document. If your data is usually in UTF-8 or ASCII, you could perhaps read
> in the first block from bytes to characters and (if the transcoder has not
> generated an exception) confirm that there is no XML encoding declaration or
> BOM or that the string "utf-8" does not appear in the XML encoding
> declaration, in which case you don't need to do anything more complicated.
> If your data is text/xml, you are indeed in a sea of complication, which is
> why text/xml has been discouraged for so long.

ok, that makes sense, thanks.


> Maybe, but the mechanism for this occur, for Apache at least, is for someone
> to write it, contribute it, champion it and maintain it.

Champion a mechanism for a web server to serve xml?  really?


> But the basic XML contract is that the encoding must be explicitly labelled
> by the sender (creator of the document) and the recipient should not guess
> but use the label. If this is too much for naive users, then XML is simply
> not the technology for them, and XML should not be blamed for not working in
> a situation it explicitly was designed to avoid. It is just like if someone
> does not know what + means they cannot use a calculator. It is not an
> indictment of mathematics if someone who does not know + cannot use a
> calculator. Character encoding is just as fundamental to computer
> programming as knowledge of the difference between floats and ints, for
> example: that Western computer science and IT courses have guaranteed the
> ignorance of their students in this is sad.

Er, ok.  You do realise there is a different expert somewhere else in
the world saying exactly the same thing about their specialist area.
(not sure I agree with that analogy either)



> In any case, I thought most people had written off RSS as unprocessable by
> generic XML tools, because so much RSS was not well-formed? I thought one
> reason for Atom was that the early RSS systems creators messed up their XML
> and RSS never recovered.  With RSS, what you are not experiencing the
> failure of XML on the web, you may be experiencing the failure of non-WF XML
> (and the potential complexity of figuring out text/xml).

The vast majority of the feeds are RSS, very few are Atom so from here
it looks like Atom has had little impact so far.  Processing the RSS
feeds are a pain but manageable using xslt 2.0 calling out to tagsoup,
jtidy etc, and using a LexicalHandler to intercept the entities.

From my naive perspective, I would've thought the web server would
serve the XML with the correct encoding in the contenttype so I don't
have to ignore it, and/or I could the XML parser a url and it would
take care of it.  I'm not sure why I should be reading appendices of
the spec and writing low-level code for something that should be an
everyday task.  In that respect, I think, you could argue it hasn't
succeeded yet.



-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

Follow-Ups:
- Re: [xml-dev] xml over http - RFC 3023
  - From: Rick Jelliffe <rjelliffe@allette.com.au>

References:
- Re: [xml-dev] xml over http - RFC 3023
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: [xml-dev] xml over http - RFC 3023
  - From: "Andrew Welch" <andrew.j.welch@gmail.com>
- Re: [xml-dev] xml over http - RFC 3023
  - From: Rick Jelliffe <rjelliffe@allette.com.au>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]