[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XMLisms and HTML parsing and modes (was: Re: XML5)
- From: Henri Sivonen <hsivonen@iki.fi>
- To: "xml-dev@lists.xml.org List" <xml-dev@lists.xml.org>
- Date: Thu, 16 Dec 2010 11:44:30 -0800
On Dec 16, 2010, at 02:25, rjelliffe wrote:
> On Thu, 16 Dec 2010 09:30:58 +0000, David Carlisle <davidc@nag.co.uk> wrote:
>
>> Yes I nearly mentioned those cases as an exception:-) But you give
>> the example that's almost reasonable (html/head/body/tbody
>> implication) while not responding to the cases that actually cause the
>> problems as they affect the parsing of arbitrarily small fragments,
>> namely /> and the different handling of end tags for individual void
>> elements.
>
> I have a strong memory of being told in the 90s (and trying it out) that "/>"
> would work in HTML parsers for XHMTL if there was a space before: eg "<br />".
It "works" in the sense that it does nothing so that writing <br /> doesn't interfere with anything compared to writing <br>.
> I had thought it applied to XML tags in general, but it seems not.
It's a problem that people have that misconception. It's *very* annoying that XML came up with <foo/> and then people started thinking that <div/> was an empty element in HTML, too, and keep bugging developers of HTML consuming software about it. HTML legacy started accumulating before XML existed, so it's not OK to change stuff just because XML added something new and XML and HTML look alike.
On Dec 15, 2010, at 23:38, James Clark wrote:
> I want it to be valid HTML (in standards mode) to supply an end-tag for any element (including <br>) ie if <x/> is valid, then <x></x> is valid.
I see. For speccing HTML5, doing things the way they were done before in order to avoid breaking existing content outweighs wishes to add syntactic sugar. As for putting it only into one mode, having more modes adds complexity in terms of implementation, quality assurance *and* author comprehension. Now that there's a decade of experience with having multiple modes, vendors other than Microsoft are generally reluctant to add more modes and are trying to make the existing modes converge rather than diverge. (The main exception is that TC39 came up with "use strict" EcmaScript5 but it's too early to tell how that one will fare in the market.)
Usually people are sufficiently scared by this flowchart for IE8's mode selection: http://hsivonen.iki.fi/doctype/ie8-mode.png
(I haven't yet gotten around to doing the community service of making a similar chart for IE9.)
On Dec 15, 2010, at 23:50, James Clark wrote:
> Is it now the case that in a modern HTML5-compliant browser the only effect of putting <!DOCTYPE html> at the beginning is the last quirk you mention (<table> ending a <p>)?
No, it's just the only remaining effect on HTML parsing. There are a bunch of effects on how CSS behaves (both parsing and behavior thereafter). Writing a spec for those behaviors hasn't been done yet. Even though the mode switch has always been in HTML, it has always been mostly about CSS. There are also a couple of DOM APIs that behave differently (mostly due to the token in the class attribute being case-insensitive in the standards mode and ASCII-case-insensitive in the quirks mode).
On Dec 16, 2010, at 01:30, David Carlisle wrote:
> It would have been possible to also stop implying html start tags if you had been prepared to have a "more standards mode" implied by (say)
> <!doctype html>
> there were reasons for not doing that, but it's a choice made, not an absolute rule that it would have been impossible to have a sensible grammar for html.
Right, but we aren't prepared to have more modes. "Sensible grammar" falls under "theoretical purity" which is the bottom priority in the design principles: http://www.w3.org/TR/html-design-principles/#priority-of-constituencies
--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]