OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XHTML 5 and validation

BOM in UTF-8 seems to cause problems with some XML parsers (incl. Xerces 
2.9.1).  They seem to believe it is white space in the prolog.  To deal 
with this, we have had to insert a processor prior to our parser which 
checks for BOM and strips it out.


On 05/20/2011 12:05 PM, Jesper Tverskov wrote:
> Good news!
> All the issues I have so aggressively raised, have been solved.
> The solution is to use the wonderful polyglot version of XHTML5,
> http://www.w3.org/TR/html-polyglot/, that is an XML document that
> validates as HTML5 if served with mimetype "text/html" and as XHTML5
> if served with mimetype "application/xhtml+xml".
> I have made a test document and both W3C Markup Validator and
> Validator.nu work right away. The document validates without the need
> for settings as HTML5 and as XHTML5 depending only on the mimetype
> used!
> The W3C Markup Validator could be better. It doesn't say what mimetype
> was detected, and report "valid HTML5" in both cases.
> *** There is another problem with the the W3C Validator, I would like
> to ask the list about.
> The W3C guidelines, "Polyglot Markup: HTML-Compatible XHTML Documents"
> (see link above) recommends to use one of three methods, separately or
> in combination, to get encoding right:
> Within the document
> 1.	Byte Order Mark (BOM) character (preferred).
> 2.	<meta charset="UTF-8"/>.
> Outside the document
> 3.	When setting the mimetype.
> I use all three methods.
> The W3C Markup Validator validates the document but gives the following warning:
> "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark
> (BOM) in UTF-8 encoded files is known to cause problems for some text
> editors and older browsers. You may want to consider avoiding its use
> until it is better supported."
> Is this still relevant? Should I drop the BOM (the polyglot guidelines
> call it the preferred method) and only use the two other methods?
> By the way, some other good news.
> *** It is easy to create polyglot XHTML5 with XSLT 2.0 ( I used
> Saxon). The following serialization
> attributes should be used:
> method="xhtml"
> omit-xml-declaration="yes"
> include-content-type="no"
> byte-order-mark="yes" (optional)
> encoding="UTF-8" (optional)
> indent="yes" (optional)
> And you should remember to place<meta charset="UTF-8"/>  in the head
> section of the XHTML document.
> That is it. Thanks for the help.
> Cheers,
> Jesper Tverskov
> http://www.xmlplease.com
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS