[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Strict-Mode and Lax-Mode MicroXML
- From: "Pete Cordell" <petexmldev@codalogic.com>
- To: <xml-dev@lists.xml.org>
- Date: Sun, 3 Jun 2012 10:00:08 +0100
This will set the cat among the pigeons...
Did we discuss the option of having Strict-Mode and Lax-Mode for MicroXML?
Strict-Mode would be what John has documented. Lax-Mode could be some
defined set of error recoveries such as & not followed by amp;, lt; etc. is
treated as an &.
Note that this wouldn't be Postel lax processing because the laxness
wouldn't be left up to the parser implementers to define. It would be part
of the spec. (Or is that too HTML5-ish!). For example you might have:
2.4.1 Character data - Lax-Mode
If a < character is not followed by a nameStartChar or a ? character
or a ! character then it should be treated as a < character that has no
special meaning.
If a & character is not followed by one of the character sequences
gt; or lt; or amp; or quot; or apos; then it should be treated as a &
character that has no special meaning.
A MicroXML parser would then return a status of either Strict-Well-Formed,
Lax-Well-Formed, or Not-Well-Formed. Perhaps when you called the parser you
would also be able to specify that you want strict-mode.
Developers would be encouraged to generate documents that are
Strict-Well-Formed, but parse them as Lax-Well-Formed.
I see this as a migration strategy to get away from some of the SGML baggage
that is no longer relevant, and maybe in 10 years time we can safely adopt
lax-mode for 99% of what developers want to do and have -- in comments etc.
Pete Cordell
Codalogic Ltd
Interface XML to C++ the easy way using C++ XML
data binding to convert XSD schemas to C++ classes.
Visit http://codalogic.com/lmx/ or http://www.xml2cpp.com
for more info
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]