[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
choosing sides
- From: Michael Sokolov <sokolov@ifactory.com>
- To: xml-dev@lists.xml.org
- Date: Sun, 12 Dec 2010 07:09:58 -0500
Hearkening back to Elliotte's proposal about forming a group to discuss
details off-list - did that already happen? If so, and its focus aligns
more-or-less with the ideas that got generated over the last week or
two, I'd be interested in participating. If there's no cabal yet, I
think it's time to form up (at least one) side, like Amy said
Here's my summary of outstanding ideas, possibly filtered by my
selective memory, with my own perspective.
I think this list may have some similarity to Pete's blend:
http://codalogic.com/xmllite/xmllite.html, but perhaps a bit more
concern for XML 1.0 compatibility? I guess this would be "Mike's mix" -
not my ideas, mostly, but my wish list.
For the moment I'll call new xml SXML ("simpler" XML? "super" XML?) :
1) Define a stance on compatibility
XML 1.0 guarantee - every well-formed XML 1.0 document encoded in
UTF-8/16 is a well-formed SXML document. I think that's do-able,
even with the proposed changes?
However the converse wouldn't be true. SXML is looser; it includes
more documents.
Can we support a statement like: every SXML document can be
represented by an "equivalent" XML 1.0 document - the data model is
essentially the same. This wouldn't be a perfect round-trip guarantee:
you might lose prefix mappings, duck-typing and other new features; just
some kind of translatability guarantee - details to be worked out to see
if there is a meaningful guarantee that can be had :)
I suppose another stance that could work is: parsers can support all
of XML 1.0, XMLNS 1.1, AND SXML - we'd have to design SXML so there
aren't any outright conflicts. But it could be a reasonable thing to
create an SXML parser that lacks support for some XML 1.0 features.
Maybe there's a "profile" defined in the document itself, as has been
suggested.
2) New Features
- new (like Kay-style hierarchic) namespaces - I'm sure there will
be all kinds of interesting discussion about how this could work out :)
- looser handling of prolog (allow whitespace)
- Ignore DOCTYPE (internal DTD set is parsed and preserved (?) for
re-serialization purposes only) - does SAX have an event for this??
ignoreableWhitespace maybe? Not sure how this would play out
elsewhere?
- treat XML decl as a PI, but also:
warn about incompatible character set?; assume utf-8 and
error when invalid utf sequence encountered
- duck-typing (provide additional event w/typed values) - this seems
do-able to me for int, date, dateTime and float. Possibly even ID for
anything else that's unquoted?
- built-in entity set (ISO right?)
- allow nested comments; no requirement for well-formedness inside
(use existing syntax) - <xml:comment> is another option.
- I think CDATA needs to stay for compatibility, but maybe there's a
SXML-only mode that ignores this?
- multiple root elements or documents in a single file
- UTF-8 and UTF-16 autodetected based on BOM (no BOM -> UTF-8)
- looser handling of ampersand - does it really need to be an error
to have <a>&foo & &bar;</a>
- also: undefined entities could be allowed and left unprocessed
- all whitespace preserved by default (even CRLF, but parsers can be
configured to do this)
- end-tag minimization; using </>? possibly only on leaves? <//> for
close-all-elements? I don't actually like this last one much, but
someone did mention lisp's close-all bracket: ], and this syntax just
sprung into my head...
Note: I don't really intend to start a whole new round of discussion on
this list, although that may be inevitable, but I'm really hoping a few
folks want to work out a small manifesto, figure out the implications
for users and tools and documents, and build some proof-of-concept
software - I'm going to go away and work on my SXML parser now :)
-Mike
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]