[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] xml over http - RFC 3023
- From: "Andrew Welch" <andrew.j.welch@gmail.com>
- To: "Rick Jelliffe" <rjelliffe@allette.com.au>
- Date: Mon, 1 Dec 2008 12:12:19 +0000
> For application/xml you ignore the first step and go straight to the
> document. If your data is usually in UTF-8 or ASCII, you could perhaps read
> in the first block from bytes to characters and (if the transcoder has not
> generated an exception) confirm that there is no XML encoding declaration or
> BOM or that the string "utf-8" does not appear in the XML encoding
> declaration, in which case you don't need to do anything more complicated.
> If your data is text/xml, you are indeed in a sea of complication, which is
> why text/xml has been discouraged for so long.
ok, that makes sense, thanks.
> Maybe, but the mechanism for this occur, for Apache at least, is for someone
> to write it, contribute it, champion it and maintain it.
Champion a mechanism for a web server to serve xml? really?
> But the basic XML contract is that the encoding must be explicitly labelled
> by the sender (creator of the document) and the recipient should not guess
> but use the label. If this is too much for naive users, then XML is simply
> not the technology for them, and XML should not be blamed for not working in
> a situation it explicitly was designed to avoid. It is just like if someone
> does not know what + means they cannot use a calculator. It is not an
> indictment of mathematics if someone who does not know + cannot use a
> calculator. Character encoding is just as fundamental to computer
> programming as knowledge of the difference between floats and ints, for
> example: that Western computer science and IT courses have guaranteed the
> ignorance of their students in this is sad.
Er, ok. You do realise there is a different expert somewhere else in
the world saying exactly the same thing about their specialist area.
(not sure I agree with that analogy either)
> In any case, I thought most people had written off RSS as unprocessable by
> generic XML tools, because so much RSS was not well-formed? I thought one
> reason for Atom was that the early RSS systems creators messed up their XML
> and RSS never recovered. With RSS, what you are not experiencing the
> failure of XML on the web, you may be experiencing the failure of non-WF XML
> (and the potential complexity of figuring out text/xml).
The vast majority of the feeds are RSS, very few are Atom so from here
it looks like Atom has had little impact so far. Processing the RSS
feeds are a pain but manageable using xslt 2.0 calling out to tagsoup,
jtidy etc, and using a LexicalHandler to intercept the entities.
From my naive perspective, I would've thought the web server would
serve the XML with the correct encoding in the contenttype so I don't
have to ignore it, and/or I could the XML parser a url and it would
take care of it. I'm not sure why I should be reading appendices of
the spec and writing low-level code for something that should be an
everyday task. In that respect, I think, you could argue it hasn't
succeeded yet.
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]