Re: [xml-dev] Re: XML5

There are four Web language vocabularies: (X)HTML, SVG, MathML and ARIA. XBL2 is on track to become a 5th. XSLT is kinda there on the side, but you don't put XSLT elements into a document tree that gets rendered in a browsing context. (XUL and XBL1 never got implemented by more than one browser engine and will no longer be available for use by random Web sites in Firefox 4.)

SVG and MathML only made it into the web because there have been a fairly active constituency within the W3C as well as browser vendors on mobile platforms (read Opera) who were willing to get it pushed - over ACTIVE opposition by other browser vendors. XForms is supported as an extension on one platform (Firefox) but via ARIA has been implemented in client side implementations on every one except for Konquerer, again with active opposition by the desktop browser vendors. �XBL2 has the potential to be a critical technology, but currently there is only one (semi-maintained) Google Projects implementation (not supported by Google); ironically, the biggest users of XBL2 are XForms implementers.

XSLT is, from a usage standpoint, a very powerful engine capable of doing major rich web application system work; it's not used in this way largely because the browser vendors have chosen the most minimal implementations, again despite a push for this functionality from people especially at the enterprise level who actively work with XML in our pipelines.�

There's a chicken and egg issue at work as well - without active vendor support (or even with vendor hostility) the native capabilities in such languages as XSLT2 or XForms have to be written out of band as JavaScript. Both are happening - the Ubiquity project and XSLTForms are two very good examples of that, along with Michael Kay's work in developing a Javascript XSLT2 implementation that should reach fruition within the next year or so. From an enterprise level both are desirable platforms, because they work with constrained object models rather than name/value vectors. Moreover, there is discussion as to how JavaScript objects could be plugged into XForms models.

�

Prior to the HTML5 parsing algorithm, only HTML+ARIA worked in text/html, so you got more vocabulary by using XML. Now that SVG and MathML work in text/html, you don't get more vocabulary by using XML. Using text/html buys you the ability to hack SVG and MathML output into your pre-existing text/html-oriented software (taking an existing CMS and bullet-proofing it to always emit well-formed XML is *hard*) and more graceful degradation in IE < 9. Using XML buys you the ability to avoid taking an HTML serializer off the shelf and the risk for the YSoD.

Huh? Lots of issues with this one. First, i see the Grandmother Principle at work here again - currently the VAST majority of web sites involve pre-generated content (I'd say 99.9%+). Most of that pre-generated content is written by developers who are cognizant of at least one programming language if not several, and most of those developers know full well that if they have syntax errors in their code, that code will not magically continue working - nor should it. This means that most developers who create such code generators - in PHP or Ruby or whatever, are creating well-formed HTML because that was what was taught to them, and are not not encouraged to create well-formed XML (which doesn't explicitly need namespaces unless you are crossing domain boundaries) because, well, HTML can be SLOPPY. Now you're integrating such sloppiness into a new generation of HTML deliberately. I'm not going to win this one, I know, but it is still an issue.

That rant aside, a bigger issue is the fact that you make the assumption that the browser vendors alone should in fact dictate the sub-languages that can be understood by the browser. In 1995, maybe even in 2000, that was a semi-valid assumption. In 2010, with ARIA and XBL and binding languages galore, that idea is silly for all but the most processor intensive of languages. Take TEI - lovely language, HTMLesque but with a much richer domain specific vocabulary, that would do wonderfully within a web browser; or DITA, or DocBook articles or (admittedly, this one does make me shudder a bit) XSL-FO. XSLT within the browser makes any or all of these possible by mapping them into HTML (on an insertion as well as a pre-processing mode), reduces the overall data burden on the server (which is one of the benefits that AJAX-centric development brings), and promotes interoperability. We're moving to a stage increasingly where the user can determine the capabilities of the browser's interpretive ability, and that comes only with XML.

You still get XSLT through <?xml-stylesheet?> if you use XML but not if you use text/html. (Note that you can pass an HTML parser-generated DOM to XSLTProcessor.) However, even if from point of view of this mailing list XSLT is a central feature, on the Web scale it is a fringe feature (but still used just enough that Opera and WebKit were dragged into implementing it not because they particularly wanted to but in order to be able to render sites that IE and Firefox are able to render). The masses of Web authors will care more about being able to do SVG in text/html than about not being able to do <?xml-stylesheet?> in text/html.

Again, some big assumptions. First, as an author of one of the first books on SVG, I think I can reasonably say that if you have the skills necessary to write SVG by hand, then you'll almost certainly be interested in xml-stylesheet capability. Most people will end up preferring to say <IMG src=""myImage.svg"/>" with no more knowledge of what an SVG file is than they will of any graphic file - especially since the bulk of non-programmatic SVG is generated by tools such as Inkscape (WHICH DOES utilize namespaces).

Second assumption - I worked as an architect on the AOL Netscape browser, which was built primarily on the Firefox framework, and I know from experience that any browser vendor has trade-offs - finding or training skilled programming resources familiar with your API, �download size (not as big a factor but still one), optimization of performance benchmarks, adding capabilities that are likely to be worth the time invested. This last point is especially worth examining - what makes XML (or JSON) a better exchange format for the type of content that your secondary audience - the web developers - are likely to be using? When you have poor tools for working with XML and better tools for working with JSON, people will use JSON. Provide a decent set of tools that both XML and AJAX communities can utilize, find ways to build bridge technologies (E4X) and you might actually find that a lot more people would end up using XML on a regular basis.

A good case in point for this can be seen in XQuery. XQuery is designed for working with XML; I can build server-side HTML code with XQuery in about 25% of the time that I can with most other framework languages, can create XML compliant code, and the code is generally both cleaner and more malleable than Javascript objects, arrays and related multi-valued entities. Had a decent XQuery processor been in a web browser a decade ago, most of AJAX would have been unnecessary, and web pages would have been considerably more secure as a side-effect. If you had XQuery today - a good set of tools for working with XML - you would also likely find that there is a much larger percentage of people out there who work with both, but that have taken the common wisdom that "well, XML in the browser is hard".

You get more proprietary extensibility with XML (which lets enterprise vendors say they do XML to appear to use a standard while they lock the customer in on the vocabulary level). However, people really shouldn't be sending content using proprietary vocabularies on the Web. Even if you style it with CSS to present it, the browser won't know its semantics and can't expose the content properly to assistive technologies for example.

Again, a major assumption that reflects more a political bias and opinion than real utility. HTML5 is a lock in of a customer to a vocabulary - it just happens to be the one that you're working on. DITA, DocBook, ePub, TEI - these are all document formats that are used widely by millions of people - they're just not used on the web because that capability has been made difficult. For that matter, if you DO have an XForms implementation, then you can in fact take ANY XML content - proprietary or not - and provide a usable interface not just for viewing the content, but for editing it as well. HTML becomes the defacto browser language - that's fine, as long as it's sufficiently expressive you need something as a core - but the idea of thinking that people shouldn't be sending "proprietary' (read - non HTML) content over the web is just absurd.

Final point - assistive technologies. I'd argue that you could do more with XSLT in making content useful for assistive technologies than anything you can build into HTML5 - An HTML5 to VoiceML transformer, for instance, can readily take take the HTML5 domain set and convert it into a readable form. HTML5 to Braile could do the same thing for visually impaired readers, and could strip out a lot of the superfluous content in the process. There are SOLVED problems, by people who have been thinking about them intensively for the last two decades or more. I can assure you that very few people build CSS stylesheets with assistive technologies in mind for their web pages, which means that most web content requires a HUGE amount of heuristics in order to be intelligible by most such devices.

Home-grown vocabularies are used over XHR, but for these kind of cases, JSON is being used more and more.

Yup - but not even at 50%, last time I checked the stats with Gartner ... and any vocabulary that is produced by most governmental entities, businesses, or organizations that's more complex than say a couple dozen elements will most likely be working with XML. So again, it's not "what is most widely used?", it's "what's coolest among my friends?" that seems to be driving this decision.

So when the vocabularies that browsers have built-in awareness for (in the sense of using them in document trees that are displayed in a browsing context) now work in text/html, a big reason to use XML is removed. This leaves less reason to use XML on the Web, which makes it less worthwhile to make XML more suitable for use on the Web.

Personally, I think this is wishful thinking on your part. You want XML to be used less on the web because it offends the sensibility of a few dozen Javascript people and so if you can single-handedly create the whole enchilada of the data structure of the web, it looks mighty good on a resume. The tools are available to make XML far more useful on the web, have been around for a while, and could readily be integrated into the browser - there are a number of people just on this list that would be able to do it. �Unfortunately, they are having to work around an entrenched cabal that feels it has all of the answers, sees the web as THEIR platform to do with as they will, and has no desire to see any potential loss of control by opening up the browsers to other organizations - globally - that have just as valid and legitimate views of the web as they do.

Don't get me wrong - I think the HTML5 spec as it stands introduces a number of useful features that are long overdue. But it's the unstated subtext - that HTML5/ARIA is a REPLACEMENT for XML - that makes me upset about how this language is developing.�

Kurt Cagle