[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Never mind the browser, let's do MicroXML
- From: David Carlisle <davidc@nag.co.uk>
- To: Kurt Cagle <kurt.cagle@gmail.com>
- Date: Sat, 18 Dec 2010 00:10:05 +0000
On 17/12/2010 23:31, Kurt Cagle wrote:
> HTML5 has some problems but ambiguity isn't really one of them, the
> html5 spec specifies in excruciating detain how to construct a parse
> tree from any stream of unicode characters. Unlike XML there are no
> states equivalent to "not well formed", every input has a defined parse.
>
> David,
>
> Hmm .. I guess what I'm saying is this - suppose that you have an input
> sequence that looks like this:
>
> <html>
> <body>
> Text
> <ul>
> <li>Line 1
> <li>Line 2
> <li>Line 3
>
> which you're implying could conceivably valid input.
Well actually it's invalid, the smallest changes I could make to make it
valid would result in
<!DOCTYPE html>
<html>
<title></title>
<body>
Text
<ul>
<li>Line 1
<li>Line 2
<li>Line 3
</ul>
>
> Because we know the underlying semantics, the processor would be able to
> parse that as:
I'm not sure that semantics are required. the html5 spec says how to
parse any input string it's a purely mechanical process with hardly any
optional or customisable behaviour. (bit scary describing the html5
parser on a thread in which Henri is likely to pop up:-)
>
> <html>
> <body>Text
> <ul>
> <li>Line 1</li>
> <li>Line 2</li>
> <li>Line 3</li>
> </ul>
> </body>
> </html>
>
> However, without those known semantics, there are ambiguities in the
> input - it could be interpreted as
well any input whether xml or html or fortran might be incorrect, not
much you can do about that.
>
> <garfle>
> <fleeblock>Text</fleeblock>
> <agbar/>
> <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>
> </garfle>
acording to html5 that is non conforming (undefined element names) but
has a defined parse tree of
<html><head></head><body><garfle>
<fleeblock>Text</fleeblock>
<agbar>
<lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>
</agbar></garfle>
</body></html>
> or
>
> <garfle>
> <fleeblock>Text
> <agbar>
> <lukvi>Line 1</lukvi>
> <lukvi>Line 2</lukvi>
> <lukvi>Line 3</lukvi>
> </agbar>
> </bleeblock>
> </garfle>
which again is non conforming but has a defined parse tree equivalent to
parsing
<html><head></head><body><garfle>
<fleeblock>Text
<agbar>
<lukvi>Line 1</lukvi>
<lukvi>Line 2</lukvi>
<lukvi>Line 3</lukvi>
</agbar>
</fleeblock></garfle>
</body></html>
>
> which may have very different interpretations based upon structure (I've
> deliberately scrambled the words to highlight the issue). If that was a
> known schema instance, it's that which I'm referring to in terms of
> ambiguity. There may be specific parsing rules in HTML5, but I daresay
> that anyone writing the initial instance I gave above probably wouldn't
> be well versed on the specification.
If you write in any language without knowing the rules of that language,
then confusion may result, but I don't think that can be called
ambiguity in the language.
>
> I think the difference in interpretation here is that the HTML5 focus is
> on tolerating ambiguity (which is what supporting multiple rules for
> parsing is)
I'm not sure what you mean by multiple rules. As you may have noticed,
when James Clark and I suggested they could have some variation in the
rules for newer documents the suggestion got a resounding no.
and treating precision as a fault, while the XML focus is on
> being willing to deal with the extra precision if it reduces ambiguity.
> That's one of the reasons I get antsy when I hear people make statements
> like the idea that HTML can replace XML. HTML+ARIA might have that
> additional precision, but it comes at the cost of requiring two
> languages plus coding to accomplish what can be done in one with XML.
>
David
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]