OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Never mind the browser, let's do MicroXML

On 17/12/2010 23:31, Kurt Cagle wrote:
>     HTML5 has some problems but ambiguity isn't really one of them, the
>     html5 spec specifies in excruciating detain how to construct a parse
>     tree from any stream of unicode characters. Unlike XML there are no
>     states equivalent to "not well formed", every input has a defined parse.
> David,
> Hmm .. I guess what I'm saying is this - suppose that you have an input
> sequence that looks like this:
> <html>
> <body>
> Text
> <ul>
> <li>Line 1
> <li>Line 2
> <li>Line 3
> which you're implying could conceivably valid input.

Well actually it's invalid, the smallest changes I could make to make it 
valid would result in

<!DOCTYPE html>
<li>Line 1
<li>Line 2
<li>Line 3

> Because we know the underlying semantics, the processor would be able to
> parse that as:

I'm not sure that semantics are required. the html5 spec says how to 
parse any input string it's a purely mechanical process with hardly any 
optional or customisable behaviour. (bit scary describing the html5 
parser on a thread in which Henri is likely to pop up:-)

> <html>
> <body>Text
> <ul>
> <li>Line 1</li>
> <li>Line 2</li>
> <li>Line 3</li>
> </ul>
> </body>
> </html>
> However, without those known semantics, there are ambiguities in the
> input - it could be interpreted as

well any input whether xml or html or fortran might be incorrect, not 
much you can do about that.
> <garfle>
> <fleeblock>Text</fleeblock>
> <agbar/>
> <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>
> </garfle>

acording to html5 that is non conforming (undefined element names) but 
has a defined parse tree of

<lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>

> or
> <garfle>
> <fleeblock>Text
> <agbar>
> <lukvi>Line 1</lukvi>
> <lukvi>Line 2</lukvi>
> <lukvi>Line 3</lukvi>
> </agbar>
> </bleeblock>
> </garfle>

which again is non conforming but has a defined parse tree equivalent to 

<lukvi>Line 1</lukvi>
<lukvi>Line 2</lukvi>
<lukvi>Line 3</lukvi>

> which may have very different interpretations based upon structure (I've
> deliberately scrambled the words to highlight the issue). If that was a
> known schema instance, it's that which I'm referring to in terms of
> ambiguity. There may be specific parsing rules in HTML5, but I daresay
> that anyone writing the initial instance I gave above probably wouldn't
> be well versed on the specification.

If you write in any language without knowing the rules of that language, 
then confusion may result, but I don't think that can be called 
ambiguity in the language.
> I think the difference in interpretation here is that the HTML5 focus is
> on tolerating ambiguity (which is what supporting multiple rules for
> parsing is)

I'm not sure what you mean by multiple rules. As you may have noticed, 
when James Clark and I suggested they could have some variation in the 
rules for newer documents the suggestion got a resounding no.

  and treating precision as a fault, while the XML focus is on
> being willing to deal with the extra precision if it reduces ambiguity.
> That's one of the reasons I get antsy when I hear people make statements
> like the idea that HTML can replace XML. HTML+ARIA might have that
> additional precision, but it comes at the cost of requiring two
> languages plus coding to accomplish what can be done in one with XML.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS