OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Well formed HTML5

Harnessing something like "HTML Tidy for HTML5" seems more immediately practical to me in getting to a solution than expecting a revision of what makes XML XML ( "HTML Tidy for HTML5" documented @ http://w3c.github.io/tidy-html5/  code @ https://github.com/w3c/tidy-html5 ).

(Of course, I'd love to see the day that something like this, or more specifically, a "Tidy for HTML5 to polygot XHTML5," is developed in XSLT... ( "Polyglot Markup: A robust profile of the HTML5 vocabulary," W3C Editor's Draft 07 May 2014 @ http://dev.w3.org/html5/html-polyglot/html-polyglot ). But I suppose expecting that to happen any time soon is also not particularly immediate or practical.)

Hope you find the "HTML Tidy for HTML5" useful.



On 2014/07/15 L2L 2L wrote:

On Jul 15, 2014, at 6:49 PM, "William Velasquez" <wvelasquez@visiontecnologica.com> wrote:

Hi XML Geeks,


The rise of HTML5 is raising practical problems for XML developers that want to embrace it, mainly by the need of delivering “nice” user interfaces and take advantage of convenient features, like Web Components, but in the other hand, handle HTML5 as trees, to do things like storing it on XML Databases.


This is not a proposal, is just a re-post of a dichotomy aroused in the eXist-db mailing list: how to parse non well-formed HTML5  created by no-xml aware developers (like third party libraries), with the aim of hearing your opinions about a possible solution (could it be <?xml version=”1.2”?>).


The main issues of contention are these three:


1.       Empty attributes (widely used in HTML5 like disabled become disabled="disabled" in XML)

2.       Script friendly handling of ampersand and angle brackets (Programmers don’t like to escape them when writing code).

3.       Empty elements no closed (like <link > instead of <link />)


I want to hear your solutions, but I’ll also suggest these in my ignorance:

1.       Allow empty attributes in well-formed XML

2.       Allow non-escaped ampersands and angle brackets when enclosed in a parenthesis expression.

3.       Propose a gentlemen's agreement to xml-haters and convince them of the desirability of a single / before closing an empty element.


As usual, we will be happy to read the enlighten opinions shared here,




William David Velásquez

Director de Investigación y Desarrollo

Visión Tecnológica S.A.S.


Tel (57 4) 444 7292

Movil (57) 311 709 8421

Follow me @williamda


1.       Allow empty attributes in well-formed XML

2.       Allow non-escaped ampersands and angle brackets when enclosed in a parenthesis expression.

3.       Propose a gentlemen's agreement to xml-haters and convince them of the desirability ofa single / before closing an empty element.

I rather not change the xml-developer way. But instead have a _javascript_(or by any other  code) handle these issues for <?xml version="1.2" encoding="UTF-8"?>

"use strict"; 
var j = 0, baz = "", k = "", bar = 

    if(bar.charAt(j) !== ","){
        k += bar.charAt(j); } else{ baz += String.fromCharCode(k)+" \v"; k = "";

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS