OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Unicode BOM as document separator [was: RE:[xml-dev] "Introducing MicroXML, Part 1: Explore the basic principles of...]

Jim DeLaHunt scripsit:

> I'm not sure how important this is to your usage, but The Unicode
> Standard already defines the meaning of a Byte Order Mark (BOM) code
> point in the midst of data. Up until Unicode 3.2, the BOM code point
> U+FEFF had the Byte Order Mark semantics at the start of a text
> stream, and the Zero-Width Non-Breaking Space (ZWNBS) semantics
> within a text stream. As such, your "<data>" element could validly
> include a U+FEFF codoe point.

That's true, but a U+FEFF cannot appear outside the root element, where
only PIs, comments, and whitespace are valid, never character content.
However, using a control character is easier on the recipient, who can
split the documents before parsing them.

Being understandable rather than obscurantist poses certain
risks, in that one's opinions are clear and therefore     | John Cowan
falsifiable in the light of new data, but it has the      | cowan@ccil.org
advantage of encouraging feedback from others.  --James A. Matisoff

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS