[
Lists Home |
Date Index |
Thread Index
]
- From: Steve Rowe <sarowe@textwise.com>
- To: Tim Crook <tcrook@JetForm.com>, xml-dev@lists.xml.org
- Date: Fri, 14 Jul 2000 15:12:05 -0400
Tim Crook wrote:
> I was looking around to see if there might have been a
> particular reason why expat was implemented such that no leading
> white space is allowed before the standard <?xml version="1.0" ?>
> line. You get the error XML_ERROR_MISPLACED_XML_PI if there are any
> leading carriage returns, line feeds, spaces or tabs.
From the XML Rec [1]:
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
I.e., if the XML declaration is in the stream, it must occupy the
first characters of the stream passed to the parser.
> From my understanding of things, the Byte Order Mark is
> what allows an XML parser to determine which character set in use.
> (see Appendix F, Autodetection of Character Encodings in
> http://www.w3.org/TR/REC-xml) If the Byte Order Mark is not found,
> shouldn't the starting content of the data stream be discarded
> until the Byte Order Mark is located?
Yes. But by the application (or other parser user), not the parser.
Note also that Appendix F is NON-normative -- compliant parsers are
not required to produce results consistent with it.
Steve Rowe
MNIS-TextWise Labs
[1] http://www.w3.org/TR/REC-xml
|