XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XHTML 5 and validation

Thanks for clearing that up - I should have asked around when I had the 
problem originally, I guess!  You have correctly inferred the source of 
our problem - using a JDK InputStreamReader in front of the parser.

cheers

-Mike

On 5/20/2011 9:28 PM, Michael Glavassevich wrote:
>
> John Cowan <cowan@ccil.org> wrote on 05/20/2011 06:59:04 PM:
>
> > Mike Sokolov scripsit:
> >
> > > BOM in UTF-8 seems to cause problems with some XML parsers
> > > (incl. Xerces 2.9.1).  They seem to believe it is white space in the
> > > prolog.  To deal with this, we have had to insert a processor prior to
> > > our parser which checks for BOM and strips it out.
> >
> > Support for the 8-BOM was not explicitly required until the XML 1.0
> > Third Edition of 2004.  Xerces 2.9.1 may be out of date.
>
> What doesn't work? Xerces has known how to handle the UTF-8 BOM for 
> much longer than that. All releases since 2003 [1] have supported it.
>
> Note that you need to the let parser use its own encoding support for 
> the InputStream.
>
> Don't pass in a UTF-8 Reader from the JDK. The JDK UTF-8 
> InputStreamReader [2] apparently doesn't recognize the BOM and perhaps 
> never will.
>
> > --
> > XQuery Blueberry DOM                            John Cowan
> > Entity parser dot-com                           cowan@ccil.org
> >     Abstract schemata http://www.ccil.org/~cowan 
> <http://www.ccil.org/%7Ecowan>
> >     XPointer errata
> > Infoset Unicode BOM                                 --Richard Tobin
> >
> > _______________________________________________________________________
> >
> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> > to support XML implementation and development. To minimize
> > spam in the archives, you must subscribe before posting.
> >
> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> > subscribe: xml-dev-subscribe@lists.xml.org
> > List archive: http://lists.xml.org/archives/xml-dev/
> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
> [1] 
> http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLEntityManager.java?r1=318934&r2=318940&diff_format=h 
> <http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLEntityManager.java?r1=318934&r2=318940&diff_format=h>
> [2] http://bugs.sun.com/view_bug.do?bug_id=4508058
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS