Re: [xml-dev] Pragmatic namespaces

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Henri Sivonen <hsivonen@iki.fi>
To: Micah Dubinko <Micah.Dubinko@marklogic.com>
Date: Mon, 24 Aug 2009 14:03:31 +0300
On Aug 24, 2009, at 01:05, Micah Dubinko wrote:

> On Aug 13, 2009, at 11:47 AM, Henri Sivonen wrote:
>
>>> Example:
>>> <head>
>>> <title>Document title</title>
>>> <com.example.project>
>>> <com.example.id>123521123</com.example.id>
>>> </com.example.project>
>>> </head>
>>>
>>> In this example document.getElementsByTagName("id") would return the
>>> innermost element.
>>> So would document.getElementsByTagNameNS("com.example", "id")
>>
>> I think here your proposal goes into the weeds.
>>
>> The #1 flaw with Namespaces & DOM Level 2 is that the identifiers  
>> that
>> are fundamental to the operation of software were different from the
>> identifiers in plain XML 1.0 or DOM Level 1. Your proposal repeats
>> this mistake by making the platform behave radically differently if
>> you have a JS program running on a browser that doesn't implement  
>> your
>> proposal and if you have the same JS program running on a browser  
>> that
>> implements your proposal.
>
> It's already the case that older browsers will interpret things  
> differently. Old browsers won't treat <svg> as something in the SVG  
> namespace, but newer ones will.

I thought the point of "extensions" was that they are mainly hooks for  
scripts. As such, they could work on existing browsers, too, if they  
were a mere naming convention. In contrast, SVG requires quite  
distinctive native 2D rendering support that can't be achieved with  
mere scripting and styling macros on top of the HTML4+CSS functionality.

As for unilateralist browser-sensitive extensions like <blink> and  
<marquee>, it would probably have been better to make them less  
attractive than the elements minted through a peer review. Thus, it  
would have been better to make them <com.netscape.blink> and  
<com.microsoft.marquee> without a mechanism to hide the prefixes.

> Since there are (presumably) far fewer HTML documents with multiple  
> dots in element names than with <svg> elements, one could argue that  
> this doesn't cause significant backwards compatibility problems  
> either.

Actually, SVG-in-text/html parsing takes special steps to deal with  
legacy content that contains SVG bit due to cargo cult copying and  
pasting. Specifically, the parser breaks out of SVG at the slightest  
hint of cargo cult copying and pasting:
http://www.whatwg.org/specs/web-apps/current-work/#parsing-main-inforeign
(The cases that say: 'A start tag whose tag name is one of: "b",  
"big", "blockquote", "body", "br", "center", "code", "dd", "div",  
"dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", "h6", "head",  
"hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol", "p",  
"pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup",  
"table", "tt", "u", "ul", "var" A start tag whose tag name is "font",  
if the token has any attributes named "color", "face", or "size"')

> By its very nature, recent HTML standardization work has been about  
> getting browsers to change how their parsers operate. I can flip a  
> switch in my Firefox and enable HTML5 mode, which has slight  
> differences in how my browser would otherwise work.

Except for the SVG and MathML stuff, the changed behaviors (compared  
to the old HTML parser in Gecko) fall mainly into one of two buckets:
  1) Changes from old Gecko behavior to align with IE or WebKit behavior
  2) Changes from old behavior of any browser in order to make the  
parser never read back from the DOM in order to allow the DOM and the  
HTML parser to live in different threads

The SVG and MathML changes are sometimes generalized to mean that  
browsers are now doing HTML extensions or namespaces. That's not  
what's happening. The SVG and MathML support is about taking the  
browser-native functionality investment that has already been made but  
that has been tied to XML parsing and enabling it in the text/html  
world.

It is incorrect to extrapolate that it's now OK to use Namespaces in  
text/html for purposes other than salvaging investments in  
functionality previously implemented but tied to XML.

> The other issue that came up in discussions is what the proper scope  
> of a proposal like this should be. Does it affect only the HTML  
> syntax rules, or could something be dreamt up that would supplant/ 
> replace xmlns in XML documents as well? That's a wide open issue  
> still.

I think it's a problem in terms of the DOM Consistency design  
principle if the same syntax doesn't produce the same DOM on both  
sides of the fence. The simplest way to make progress is to stick to  
things that don't require any changes to the XML 1.0 4th ed. +  
Namespaces 1.0 layers on the XML side and that don't trigger any  
Namespaces layer processing (i.e. don't use the colon).

Using reverse DNS identifiers as a naming convention without affecting  
how the DOM/Infoset is constructed meets these requirements.

>> In your example, the local name of the innermost element MUST be
>> "com.example.id" for compatibility with existing behavior.
>
> Based on your following comment, it's not clear if you mean existing  
> behavior of parsers or of the DOM API...

Parsers and DOM Level 2 are rather heavily coupled. I meant in terms  
of the DOM API, in terms of the Infoset, in terms of the XPath data  
model, in terms of Selectors and in terms of the browser-internal APIs.

>> Changing
>> what document.getElementsByTagName() returns here is not something
>> that's open for discussion. (As in, the probability of a browser
>> vendor shipping with the API behavior change is virtually zero.)
>
> Right, I wouldn't expect any DOM functions to change w.r.t.  
> returning already-parsed DOM information.

OK. Your proposal specifically suggested changes to what specific DOM  
methods return, though.

>> The namespace of the innermost element as reported by the DOM isn't
>> really open for discussion, either. In an HTML5-compliant UA it is "http://www.w3.org/1999/xhtml
>> ", because this unifies the DOM with the XHTML5 side, where the
>> namespace is constrained by the XHTML legacy to be "http://www.w3.org/1999/xhtml
>> ". In legacy UAs, the namespace is null.
>
> There are already special cases for SVG, MathML, etc., and already  
> differences with legacy browsers. Is one more class of special cases  
> beyond consideration? If so, why?

SVG and MathML are the only vocabularies that have already been  
implemented in multiple browsers and that need bringing into the text/ 
html world. New things can simply be added to the http://www.w3.org/1999/xhtml 
  namespace. That's what was done with <video>, for example.

I think it is a bug that the organization of the W3C into Working  
Groups leaks to Web authors in the form of multiple namespaces.  
(Consider Conway's Law.) Changing the namespace of SVG and MathML or  
withdrawing Namespaces from the XML stack would be too late at this  
point, so the namespaces for SVG and MathML need to be grandfathered  
in, but there's no reason to keep minting more namespaces.

>>> Requirement: widely-known namespaces must be parse to an equivalent
>>> DOM as xmlns
>
> Think of this: it's entirely possible that the arrival of a  
> distributed extensibility mechanism in HTML (not just XHTML) might  
> forever change some respects of how HTML gets written, and how XML  
> vocabularies are defined.
>
> For example, say Tim's proposal mentioned earlier takes off. Then 1)  
> many more HTML documents will be around using one-off <x.y.z>  
> element names, and 2) future XML vocabularies might use element  
> names like <foo.bar.baz> (possibly without namespaces) so that they  
> could be readily used in HTML.

This is possible. (The XML vocabularies that wish to use dotted names  
and be mixed readily into text/html should probably use the http://www.w3.org/1999/xhtml 
  namespace, though.)

> But even if this happens, there still exists *some* list of older  
> namespaces/vocabularies that people will want to use in HTML. I  
> don't have a strong opinion of what that list might be, so I put  
> together some typical examples in the initial proposal in an attempt  
> to smooth over the inevitable transition process.

I think it's pretty simple to form the list of privileged namespaces:  
Take the set of namespaces supported by at least two browsers out of  
IE, Firefox, Safari and Opera in subtrees of application/xhtml+xml  
documents.

>>> Example:
>>>
>>> <html using.math="math">...
>>> <p>
>>> E.g. <math><msqrt><mi>π</mi></msqrt></math>
>>> </p>
>>> ...</html>
>>
>> This already works in HTML5 without even having to use the using.math
>> stuff. I invite you to try it in a trunk nightly build of Firefox
>> after you've set the preference html5.enable to true in about:config.
>
> What if these namespace assignments could happen in a less magical  
> fashion?

Why shouldn't SVG and MathML work just as easily as HTML? What benefit  
is there for Web authors for having to use incantations like xmlns or  
using.math when it has now been shown by implementation this stuff can  
work without such incantations? I think the HTML5 way of incorporating  
SVG and MathML is less magic in the sense that there are no spells for  
the author to cast.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
References:
- Re: [xml-dev] Pragmatic namespaces
  - From: Henri Sivonen <hsivonen@iki.fi>
- Re: [xml-dev] Pragmatic namespaces
  - From: Micah Dubinko <micah.dubinko@marklogic.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]