OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: expat whitespace weirdness?

[ Lists Home | Date Index | Thread Index ]
  • From: Steve Rowe <sarowe@textwise.com>
  • To: Tim Crook <tcrook@JetForm.com>, xml-dev@lists.xml.org
  • Date: Fri, 14 Jul 2000 15:12:05 -0400

Tim Crook wrote:
> I was looking around to see if there might have been a
> particular reason why expat was implemented such that no leading
> white space is allowed before the standard <?xml version="1.0" ?>
> line. You get the error XML_ERROR_MISPLACED_XML_PI if there are any
> leading carriage returns, line feeds, spaces or tabs.

From the XML Rec [1]:

  [22] prolog  ::= XMLDecl? Misc* (doctypedecl Misc*)?
  [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

I.e., if the XML declaration is in the stream, it must occupy the
first characters of the stream passed to the parser.

> From my understanding of things, the Byte Order Mark is
> what allows an XML parser to determine which character set in use.
> (see Appendix F, Autodetection of Character Encodings in
> http://www.w3.org/TR/REC-xml) If the Byte Order Mark is not found,
> shouldn't the starting content of the data stream be discarded
> until the Byte Order Mark is located?

Yes.  But by the application (or other parser user), not the parser.
Note also that Appendix F is NON-normative -- compliant parsers are
not required to produce results consistent with it.

Steve Rowe
MNIS-TextWise Labs

[1] http://www.w3.org/TR/REC-xml


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS