OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] RE: evolvable formats

[ Lists Home | Date Index | Thread Index ]

10/10/2002 5:33:19 AM, "bryan" <bry@itnisk.com> wrote:

Referring generally to http://lists.xml.org/archives/xml-dev/200210/msg00583.html
but not quoting at length.

I am not an RSS weenie, but I am very interested in it as a case study
in "evolveable formats", i.e., how real users deal with the cruel reality
that application-level XML standards are produced slowly, by political
and interpersonal processes that seldom yield fully satisfactory
results, and which are obsolete the day they are cast in a schema 
and/or given a namespae :-).  XML doesn't fix human frailty, it just
reduces the technological overhead that hid human issues behind 
syntax and interoperability problems (e.g., EDI standards AFAIK).

FWIW, I take away the following lessons from the RSS 0.9x/1.0/2.0/3.0/etc.
experience (which again I'm happy not to have lived through), and
would appreciate some responses from people who lived through it
firsthand.

1 - Politics happens, Evolution is continuous, deal with it.  
  With technology, as best you can.  Don't make technology choices
  that are fragile in the face of human nature.

2 - Namespaces - work best for mixing instances of well-defined
  vocabularies/schemas together, they don't work so well to support
  evolution or un-typed XML. Schema evolution using namespaces is
  a Known to Be Hard, TAG-level problem.

  If you want to leverage commonly deployed code that understands
  a specific namespace (XHTML, SVG, etc.), the full-blown Namespaces
  in XML is your friend, well Real Soon Now anyway.  If you just
  want to disambiguate tags, it has lots of little gotchas
  (that "RSS 2.0" seems to have been gotten by!) that make it a 
  challenge for people who don't grok its subtleties. (MOST OF
  THE REAL WORLD!!!)
  
  If NS for XML is overkill for you and your users, steal its
  great idea of leveraging DNS to disambiguate tags, but in
  a more Desperate Perl Hacker or home-brew parser friendly
  way.  For example, 

    <rant class="xegesis.org">Why oh Why oh Why oh Why!!!! </rant>
  
  is distinct from all other "rant" elements using this convention,
  is easy to handle with regexp or bare-bones well-formed XML, but
  won't fall afoul of "real" NS-aware software.

  or
    <em class="www.w3.org/1999/xhtml"> Stay away from all the messy verbosity 
   of a URI if you don't NEED  it to be a URI</em>
  
  This tells someone downstream that you mean the HTML "em" tag, not
  something else. A real browser will quietly ignore it on display.
  Someday Real Soon Now an AF-NG architecture map can be used to aid
  its processing by generic software :-)

3 - If you don't know exactly what you're dealing with, heuristics
  beat logic.  If the tag is  <table>  and it has
  HTML table elements inside it, it's probably an HTML table!  Don't
  throw it away because it's in the wrong namespace.

4 - If you're going to claim to use an XML technology, do it 
properly.  If you're using angle brackets to make regexp parsers
easier to write, don't use XML idioms (especally colons!!!) half-way.  

5 - If you really don't need XML, don't fool with it.  Well-formed
  XML has the advantage of letting you deal with hierarchical,labelled
  data in a way that leverages all sorts fo other tools.  If all you
  have is name-value pairs (i.e., 99.99% of the RSS I've seen) something
  like RSS 3 makes an awful lot of sense.  If you're rolling your own
  tools and not leveraging XML tools, ask yourself what value XML 
  offers you or your users.  (Hint: WikiML is a LOT easier to author
  than XML ... RSS 3.0 is a one-line parser in Python, Perl (?), etc.)


I guess this is more of a question:

6 - Why on earth would one even THINK about using entity-encoded
non-well-formed HTML in a syndication format???  Use the HTML
tags, but close them!  Use tidy to clean up the junk you get
from your users!  Why fool with any alternative?  Even if you're
taking the advice in point 5, just "escape it" with an HTML: line label
or whatever.  Someone downstream will thank you.


Again, these are starting points for discussion by an outsider, 
not advice from one claiming these are design patterns or best practices

Thoughts? 








 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS