OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Polyglot Markup - serializer questions

On 7/8/10 12:37 PM, David wrote:
> Thanks to Twitter ( and @xquery ) I stumbled on this
> http://www.w3.org/TR/html-polyglot/
> I think the goals are excellent, but I do have some questions for 'the
> experts'.

I'm not an expert, but I am a sort of close observer.

> 1) Why is this useful instead of sticking to xhtml ? The Abstract says
> "Polyglot markup that meets these constraints as interpreted as
> compatible, regardless of whether they are processed as HTML or as
> XHTML, per the HTML5 specification"
> But I dont quite get why this is necessary ? I'm sure I'm missing the
> obvious, people dont (usually) write specs just for the fun of it.

Well, the HTML5 folks did write their own parsing specs:


for about "the fun of it" so far as I can tell.

I'm taking Polyglot Markup to be pretty much an update equivalent to the 
HTML Compatibility Guidelines from XHTML 1.0:


Although... the DOCTYPE legacy string and obsolete permitted DOCTYPE string:


bring new levels of weird to these stories, something the Polyglot 
document seems to gloss.  (If you thought the XML rules for DOCTYPE were 
a bit difficult to read...)  There are still some large issues with XML 
processing of HTML5 documents containing entities that HTML5 seems to 
assume magically go away.  For my own purposes, I'll be creating an 
HTML5 DTD once the spec is actually cooked.

For the genuinely intrepid, there is also a RELAX NG + Schematron 
definition of the HTML5 vocabulary:


> 2) New XML serializer implementations ?
> The doc discusses the difference between empty tags which are EMPTY vs
> not. E.g. says to use <br/> but not <p/> (instead use <p></p>)
> This would imply (?) that an XML serializer would need to know when its
> OK and not to compress empty tags.
> Serializers such as Saxon with html encoding do this but they do it
> differently ... e.g a <br/> in XML becomes <br> in html mode.
> Does following this standard imply that we need new output methods for
> serializers? Or do we have to force serializers to not do any empty
> element optimization and leave it up to the input code generation/source ?
> I think this might be difficult in something like xquery or xslt using
> dynamic element construction where its not explicit which empty element
> form is used.
> e.g in xquery ... how does the serializer know to expand <p></p> but not
> <br/> ?
> element { "p" } {}
> element { "br" } {}

This part I've never really understood, but it seems that some browsers 
have had problems dealing with empty tags (<br />) in certain circumstances.

I do think it's a long-term challenge for serializers, but it hasn't 
stopped me from using ancient tools to create and reprocess HTML5 documents.

I would strongly encourage XML folk to pay attention to HTML5.  We're 
not precisely welcome in the conversation, if HTML5 rhetoric is to be 
taken seriously, but I suspect we might yet have some useful role to 
play improving this.

Simon St.Laurent

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS