Re: [xml-dev] Never mind the browser, let's do MicroXML

HTML5 has some problems but ambiguity isn't really one of them, the html5 spec specifies in excruciating detain how to construct a parse tree from any stream of unicode characters. Unlike XML there are no states equivalent to "not well formed", every input has a defined parse.

David,

Hmm .. I guess what I'm saying is this - suppose that you have an input sequence that looks like this:

<html>

<body>

Text

<ul>

<li>Line 1

<li>Line 2

<li>Line 3

which you're implying could conceivably valid input.

Because we know the underlying semantics, the processor would be able to parse that as:

<html>

�� <body>Text

�� <ul>

�� <li>Line 1</li>

�� <li>Line 2</li>

�� <li>Line 3</li>

�� </ul>

�� </body>

</html>�

However, without those known semantics, there are ambiguities in the input - it could be interpreted as

�� <fleeblock>Text</fleeblock>

�� <agbar/>

�� <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>

</garfle> � ��

��

or�

�� <fleeblock>Text

�� <agbar>

�� <lukvi>Line 1</lukvi>
�� <lukvi>Line 2</lukvi>
�� <lukvi>Line 3</lukvi>

�� </agbar>

�� </bleeblock>

</garfle>

which may have very different interpretations based upon structure (I've deliberately scrambled the words to highlight the issue). If that was a known schema instance, it's that which I'm referring to in terms of ambiguity. There may be specific parsing rules in HTML5, but I daresay that anyone writing the initial instance I gave above probably wouldn't be well versed on the specification.�

I think the difference in interpretation here is that the HTML5 focus is on tolerating ambiguity (which is what supporting multiple rules for parsing is) and treating precision as a fault, while the XML focus is on being willing to deal with the extra precision if it reduces ambiguity. That's one of the reasons I get antsy when I hear people make statements like the idea that HTML can replace XML. HTML+ARIA might have that additional precision, but it comes at the cost of requiring two languages plus coding to accomplish what can be done in one with XML.

Kurt Cagle
XML Architect
Lockheed / US National Archives ERA Project

�

David