OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XHTML 5 and validation

John Cowan <cowan@ccil.org> wrote on 05/20/2011 06:59:04 PM:

> Mike Sokolov scripsit:
> > BOM in UTF-8 seems to cause problems with some XML parsers
> > (incl. Xerces 2.9.1).  They seem to believe it is white space in the
> > prolog.  To deal with this, we have had to insert a processor prior to
> > our parser which checks for BOM and strips it out.
> Support for the 8-BOM was not explicitly required until the XML 1.0
> Third Edition of 2004.  Xerces 2.9.1 may be out of date.

What doesn't work? Xerces has known how to handle the UTF-8 BOM for much longer than that. All releases since 2003 [1] have supported it.

Note that you need to the let parser use its own encoding support for the InputStream.

Don't pass in a UTF-8 Reader from the JDK. The JDK UTF-8 InputStreamReader [2] apparently doesn't recognize the BOM and perhaps never will.

> --
> XQuery Blueberry DOM                            John Cowan
> Entity parser dot-com                           cowan@ccil.org
>     Abstract schemata                           http://www.ccil.org/~cowan
>     XPointer errata
> Infoset Unicode BOM                                 --Richard Tobin
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[1] http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLEntityManager.java?r1=318934&r2=318940&diff_format=h
[2] http://bugs.sun.com/view_bug.do?bug_id=4508058

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com

E-mail: mrglavas@apache.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS