A note on compatibility (from one who is looking for more fundamental
change).
First - backward compatibility needs to be absolute, in that all that is
out there needs to be useable.
Starting points for syntax:
* Already, XML prolog has options for the parser.
* Existing browsers parse existing XML and can be extended.
* Data for XML applications comes in a variety of formats, including XML
and databases.
* XML users, with a variety of needs, range from programmers to
subject experts and "tweakers".
And as a basis for more exciting capabilities
* Existing browsers already allow some interaction between the use
interface and the specification - e.g., "View Source".
Conclusions:
XML needs to allow for more than one reader / parser; some standard,
some user written - to extract data from complex documents. One
particular need is for error handling specifications for both detailed
status and end user alerts.
Note also that parsing specifications can be internal, or can be provided
externally, when the data source is to be processed.
Note also, that a basic diagram editor, with built in parsing for
generating / validating specifications, can serve as a basis for
integrating comprehensive tools in a way that does not limit the
capabilities of using any of XML in a consistent way including evolving
capabilities. (I'm assuming here somewhat that a new XML has basic
facilities for presentation of hierarchies and networks, as well as list and
tables.)
Compatibility in semantics is a thornier issue, but needs to be addressed
similarly - such as through extended and alternative infosets.
Bill
In a message dated 12/12/2010 7:23:18 A.M. Eastern Standard Time,
sokolov@ifactory.com writes:
Hearkening back to Elliotte's proposal about forming a group to
discuss
details off-list - did that already happen? If so, and its
focus aligns
more-or-less with the ideas that got generated over the last
week or
two, I'd be interested in participating. If there's no cabal yet,
I
think it's time to form up (at least one) side, like Amy
said
Here's my summary of outstanding ideas, possibly filtered by my
selective memory, with my own perspective.
I think this list may have
some similarity to Pete's blend:
http://codalogic.com/xmllite/xmllite.html, but perhaps a bit more
concern for XML 1.0 compatibility? I guess this would be "Mike's
mix" -
not my ideas, mostly, but my wish list.
For the moment I'll
call new xml SXML ("simpler" XML? "super" XML?) :
1) Define a stance on
compatibility
XML 1.0 guarantee - every well-formed XML
1.0 document encoded in
UTF-8/16 is a well-formed SXML
document. I think that's do-able,
even with the proposed
changes?
However the converse wouldn't be true.
SXML is looser; it includes
more documents.
Can we
support a statement like: every SXML document can be
represented by an
"equivalent" XML 1.0 document - the data model is
essentially the
same. This wouldn't be a perfect round-trip guarantee:
you might
lose prefix mappings, duck-typing and other new features; just
some kind
of translatability guarantee - details to be worked out to see
if there is
a meaningful guarantee that can be had :)
I suppose
another stance that could work is: parsers can support all
of XML 1.0,
XMLNS 1.1, AND SXML - we'd have to design SXML so there
aren't any
outright conflicts. But it could be a reasonable thing to
create an
SXML parser that lacks support for some XML 1.0 features.
Maybe
there's a "profile" defined in the document itself, as has been
suggested.
2) New Features
- new (like
Kay-style hierarchic) namespaces - I'm sure there will
be all kinds of
interesting discussion about how this could work out :)
-
looser handling of prolog (allow whitespace)
- Ignore
DOCTYPE (internal DTD set is parsed and preserved (?) for
re-serialization purposes only) - does SAX have an event for
this??
ignoreableWhitespace maybe? Not sure how
this would play out
elsewhere?
- treat XML decl as a
PI, but also:
warn about incompatible character set?;
assume utf-8 and
error when invalid utf sequence
encountered
- duck-typing (provide additional event
w/typed values) - this seems
do-able to me for int, date, dateTime and
float. Possibly even ID for
anything else that's
unquoted?
- built-in entity set (ISO
right?)
- allow nested comments; no requirement for
well-formedness inside
(use existing syntax) - <xml:comment> is
another option.
- I think CDATA needs to stay for
compatibility, but maybe there's a
SXML-only mode that ignores
this?
- multiple root elements or documents in a single
file
- UTF-8 and UTF-16 autodetected based on BOM (no BOM
-> UTF-8)
- looser handling of ampersand - does it
really need to be an error
to have &foo &
&bar;
- also: undefined entities could be allowed
and left unprocessed
- all whitespace preserved by
default (even CRLF, but parsers can be
configured to do
this)
- end-tag minimization; using </>? possibly
only on leaves? <//> for
close-all-elements? I don't actually
like this last one much, but
someone did mention lisp's close-all bracket:
], and this syntax just
sprung into my head...
Note: I don't really
intend to start a whole new round of discussion on
this list, although
that may be inevitable, but I'm really hoping a few
folks want to work out
a small manifesto, figure out the implications
for users and tools and
documents, and build some proof-of-concept
software - I'm going to
go away and work on my SXML parser now
:)
-Mike
_______________________________________________________________________
XML-DEV
is a publicly archived, unmoderated list hosted by OASIS
to support XML
implementation and development. To minimize
spam in the archives, you must
subscribe before posting.
[Un]Subscribe/change address:
http://www.oasis-open.org/mlmanage/
Or unsubscribe:
xml-dev-unsubscribe@lists.xml.org
subscribe:
xml-dev-subscribe@lists.xml.org
List archive:
http://lists.xml.org/archives/xml-dev/
List Guidelines:
http://www.oasis-open.org/maillists/guidelines.php