OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] RE: Why is there little usage of XML on the "visibleWeb"?

[ Lists Home | Date Index | Thread Index ]


I think there is not a lot of XML on the visible web because:
  • HTML = good guarantee against mass-copyright infringement: It makes it harder for 3rd parties to automatically extract and re-purpose data/content without the authorization of the publisher (ex. index it so that a search engine could provide the ability to perform rich queries). Because of this, today the publisher has an internal XML that is published in HTML ("encrypted" format) to the public, and in XML to distributors, or a mix  of XML (RSS) focusing on metadata and HTML.
  • Writing XML+XSL requires skills that are not as widely available as writing HTML.

Basically, it's a all-or-nothing approach to me.

But, why are the two approaches you give incompatible? why can't we have what I would call "Semantic Tagging" of HTML:


where "ia" is the prefix of an industry association namespace and html is the prefix of the html namespace.

I think this would not be too steep a learning curve for people knowing HTML only to learn it, and choose which data they want to "tag" for others to extract/process, and which data they don't want to "tag". And it would be simpler and a better reuse of current XML technologies to me, than the current <div class="fruit">Orange</div> approach I've seen also somewhere. Also, I'm not an XML parsing expert, but I believe this may require a couple changes on how some XML technologies work, but I imagine it would not be a revolution either.

That way, a HTML browser can only keep the content enclosed in the namespace it understands and display it, while upon selection by the user an "ia" compatible browser plugin  could extract the data and import it, or a complete separate application could aggregate all the grocery list in the world in one searchable database...

Well, ok, a grocery list may not be the best example, but think about the application to a Postal Address, collection of online articles, Invoice, Insurance Policy, etc.

My 2 cents


Costello, Roger L. wrote:

Hi Folks,


There didn’t seem to be any objection to the assertion that HTML is the primary markup language for the visible Web, and that XML is not appropriate for the visible Web.  This is very interesting, as some rather revealing assertions may be derived:




“The more a piece of knowledge becomes available, the more valuable it potentially becomes, because of the wider array of possible uses for it.”


The quote is from: The Wisdom of Crowds by James Surowiecki, p. 166-167.




Information on the hidden Web has limited availability.




Information on the visible Web has wide availability.  (There are a billion Internet users.  Potentially each of them could use information that is on the visible Web.)




Your information is most valuable when available on the visible Web.




Web services are part of the hidden Web, and thus obsessing over their nature (e.g., SOAP versus REST) is not valuable.




For maximum impact, focus your main efforts on making information available on the visible Web.






From: Costello, Roger L.
Sent: Sunday, July 16, 2006 9:07 AM
To: xml-dev@lists.xml.org
Subject: Why is there little usage of XML on the "visible Web"?


Hi Folks,




By “visible Web” I mean the portion of the Web that produces information intended for human consumption.  In particular, in this message I will focus on the portion of the Web that produces information to be consumed by humans via a browser. 




The “hidden Web”, on the other hand, is the portion of the Web that produces the information intended to be consumed by machines (i.e., machine-to-machine interaction).




Below are two examples to demonstrate what I mean by using XML on the visible Web versus not using XML on the visible Web. 


Note: I realize that XHTML is XML, but for this discussion when I refer to “XML” I am not referring to XHTML. 




Suppose that you have a Web site where you make available your grocery list to anyone with a browser.  “Using XML on the visible Web” means that you create an XML document that contains the raw grocery list, and a separate document (e.g., XSLT) which transforms the raw grocery list into a visually appealing form.  Here is grocery.xml:


<?xml version=“1.0”?> “

<?xml-stylesheet type=“text/xsl” href="“grocery.xsl”?>







Here is the URL to your grocery list resource:




A browser client that issues this URL will receive grocery.xml, and then it will dynamically transform the XML into HTML using grocery.xsl 


Let’s imagine that grocery.xsl displays the grocery items as an unordered bulleted list, and so the XML is rendered by the browser like this:


  • Orange
  • Chicken
  • Corn


Your Web site is employing XML on the visible Web!




Now let’s contrast the above example with not using XML on the visible Web, instead, using HTML.  A browser client that issues the above URL will receive from your Web site this HTML:












The browser immediately renders the HTML.  The same bulleted list shown above is displayed.


Your Web site is not employing XML on the visible Web!




1. The tags <grocery-list>, <fruit>, <meat>, <vegetable> are more “semantically rich” than the tags <ul> and <li>.  Thus, it seems plausible that a search tool could recognize that the above XML document is not relevant to a query for, say, orange cars.  But the search tool would not be able to recognize that the above HTML document is not relevant.  So, using XML on the visible Web has the potential to facilitate more accurate searches.


2. The job of styling the information is offloaded to the clients.  The Web server is relieved of the transformation burden and thus can potentially process more requests.




There is little usage of XML on the visible Web.


Below are several assertions which attempt to explain why there is little usage of XML on the visible Web.




There is not the necessary “critical mass” of browsers which support the styling of XML using either CSS or XSLT.




Advantage 1 listed above is a myth, i.e., <grocery-list>, <fruit>, <meat>, <vegetable> is not more “semantically rich” than <ul> and <li>.  In fact, the opposite is the case.  The tags <ul> and <li> have clear semantics (i.e., an unordered list of items) that are understood by every browser on the planet.  Conversely,   <grocery-list>, <fruit>, <meat>, <vegetable> have vague semantics, are understood only by English-speaking people, and probably zero applications on the visible Web would be able to do anything useful with the tags or the data within the tags.




Advantage 2 listed above is also an advantage of HTML, i.e., the data is contained in an HTML document and the presentation instructions are contained in a separate CSS document.




XML is not appropriate for the visible Web.  XML will continue to have limited usage on the visible Web.  As Len Bullard says, “XML is plumbing”. 




On the visible Web, HTML will continue to be the primary markup language for the foreseeable future.




Which assertions do you accept, and which do you reject?  For those assertions you reject, why?




Suppose that you are in charge of a Web (you control the funding of all the Web sites).  Would you issue this mandate to all the Web site developers: “All information on the visible Web must be in XML”?  If you would issue this mandate, why?




Do you think that XML should have a more prominent role on the visible Web?  If so, how would you stimulate greater usage of XML on the visible Web?




No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.10.1/389 - Release Date: 7/14/2006


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS