[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 BOM
- From: Rob Lugt <roblugt@elcel.com>
- To: Richard Tobin <richard@cogsci.ed.ac.uk>, xml-dev@lists.xml.org
- Date: Thu, 14 Jun 2001 13:08:32 +0100
Richard,
Here is the output from the ElCel Technology XML Validator [1]
http://www.cogsci.ed.ac.uk/~richard/bomtest/doc.xml is well formed
http://www.cogsci.ed.ac.uk/~richard/bomtest/docx.xml is well formed
http://www.cogsci.ed.ac.uk/~richard/bomtest/ent.xml is well formed
http://www.cogsci.ed.ac.uk/~richard/bomtest/entx.xml is well formed
http://www.cogsci.ed.ac.uk/~richard/bomtest/ent2.xml is well formed
ent2x.ent [1:7] : Fatal error: 'xml' is not a valid processing instruction
target
The ElCel Technology parser examines every external entity for a BOM. It
recognizes the 7 BOMs listed in XML 1.0 Appendix F.1, i.e. UCS-4 (x 4),
UTF-16BE, UTF-16LE and UTF-8. When a BOM is recognized, it is removed from
the input stream before the stream is decoded and passed on to the parser.
Regards
Rob Lugt
ElCel Technology
[1] http://www.elcel.com/products/xmlvalid.html
----- Original Message -----
From: "Richard Tobin" <richard@cogsci.ed.ac.uk>
To: <xml-dev@lists.xml.org>
Sent: Thursday, June 14, 2001 12:23 PM
Subject: UTF-8 BOM
> The W3C XML Core WG is considering the question of whether a UTF-8
> byte-order make (BOM) is allowed at the start of an XML entity. This
> question was raised a few weeks ago in a thread on comp.text.xml
> starting at article
>
> <180520011620538217%andreas.prilop@altavista.net>
>
> We would like to determine how existing parsers handle the byte
> sequence #xEF #xBB #xBF when it appears at the start of an XML
> document or other entity. Is it treated as a BOM (and not part
> of the text of the entity) or as a zero-width non-breaking space
> character?
>
> We have placed a number of test cases at
>
> http://www.cogsci.ed.ac.uk/~richard/bomtest/
>
> and would be grateful for feedback on how parsers handle them. Please
> post results here in xml-dev to avoid unnecessary duplication.
>
> We would also like to know of any editors (or similar tools) that
> generate XML documents starting with a UTF-8 BOM.
>
> -- Richard (on behalf of the XML Core WG)
>
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org
>
>
- References:
- UTF-8 BOM
- From: Richard Tobin <richard@cogsci.ed.ac.uk>