Lists Home |
Date Index |
10/10/2002 5:33:19 AM, "bryan" <email@example.com> wrote:
Referring generally to http://lists.xml.org/archives/xml-dev/200210/msg00583.html
but not quoting at length.
I am not an RSS weenie, but I am very interested in it as a case study
in "evolveable formats", i.e., how real users deal with the cruel reality
that application-level XML standards are produced slowly, by political
and interpersonal processes that seldom yield fully satisfactory
results, and which are obsolete the day they are cast in a schema
and/or given a namespae :-). XML doesn't fix human frailty, it just
reduces the technological overhead that hid human issues behind
syntax and interoperability problems (e.g., EDI standards AFAIK).
FWIW, I take away the following lessons from the RSS 0.9x/1.0/2.0/3.0/etc.
experience (which again I'm happy not to have lived through), and
would appreciate some responses from people who lived through it
1 - Politics happens, Evolution is continuous, deal with it.
With technology, as best you can. Don't make technology choices
that are fragile in the face of human nature.
2 - Namespaces - work best for mixing instances of well-defined
vocabularies/schemas together, they don't work so well to support
evolution or un-typed XML. Schema evolution using namespaces is
a Known to Be Hard, TAG-level problem.
If you want to leverage commonly deployed code that understands
a specific namespace (XHTML, SVG, etc.), the full-blown Namespaces
in XML is your friend, well Real Soon Now anyway. If you just
want to disambiguate tags, it has lots of little gotchas
(that "RSS 2.0" seems to have been gotten by!) that make it a
challenge for people who don't grok its subtleties. (MOST OF
THE REAL WORLD!!!)
If NS for XML is overkill for you and your users, steal its
great idea of leveraging DNS to disambiguate tags, but in
a more Desperate Perl Hacker or home-brew parser friendly
way. For example,
<rant class="xegesis.org">Why oh Why oh Why oh Why!!!! </rant>
is distinct from all other "rant" elements using this convention,
is easy to handle with regexp or bare-bones well-formed XML, but
won't fall afoul of "real" NS-aware software.
<em class="www.w3.org/1999/xhtml"> Stay away from all the messy verbosity
of a URI if you don't NEED it to be a URI</em>
This tells someone downstream that you mean the HTML "em" tag, not
something else. A real browser will quietly ignore it on display.
Someday Real Soon Now an AF-NG architecture map can be used to aid
its processing by generic software :-)
3 - If you don't know exactly what you're dealing with, heuristics
beat logic. If the tag is <table> and it has
HTML table elements inside it, it's probably an HTML table! Don't
throw it away because it's in the wrong namespace.
4 - If you're going to claim to use an XML technology, do it
properly. If you're using angle brackets to make regexp parsers
easier to write, don't use XML idioms (especally colons!!!) half-way.
5 - If you really don't need XML, don't fool with it. Well-formed
XML has the advantage of letting you deal with hierarchical,labelled
data in a way that leverages all sorts fo other tools. If all you
have is name-value pairs (i.e., 99.99% of the RSS I've seen) something
like RSS 3 makes an awful lot of sense. If you're rolling your own
tools and not leveraging XML tools, ask yourself what value XML
offers you or your users. (Hint: WikiML is a LOT easier to author
than XML ... RSS 3.0 is a one-line parser in Python, Perl (?), etc.)
I guess this is more of a question:
6 - Why on earth would one even THINK about using entity-encoded
non-well-formed HTML in a syndication format??? Use the HTML
tags, but close them! Use tidy to clean up the junk you get
from your users! Why fool with any alternative? Even if you're
taking the advice in point 5, just "escape it" with an HTML: line label
or whatever. Someone downstream will thank you.
Again, these are starting points for discussion by an outsider,
not advice from one claiming these are design patterns or best practices