Re: [xml-dev] XML Redux

> I think this might be a good point to bring back discussion on E4X, which
> did in fact solve most of the issues that have been brought up on the list
> several times. It treated XML as a native data type, provided a JavaScript
> syntax for working with the XML native type (and XMLList type), could easily
> have been incorporated into JSON.

It's too powerful to fit JSON. �In particular, in E4X you can incorporate
arbitrary JavaScript expressions within braces into character content and
attribute values. �JSON is not about arbitrary JavaScript expressions.

But what if you placed the same constraints upon an E4X representation as you do within JSON - no evaluative constructs. For instance,

// myJAXON.js

{"firstname":"John", "lastname":"Cowan", "signature":"xml(<div>It was the <i>best</i> of times.<br/>It was the <i>worst</i> of times.</div>)"}

Doug Crockford has made a repeated point that JSON should in general not be straight script eval'd but should be parsed, and I think that holds for XML representations as well. An E4X aware parser would then handle the relevant parsing into an object with both XML and Javascript object NVPs.

var record = JAXON.parse('myJAXON.js')�

echo(record);

// {firstname:"John", lastname:"Cowan", signature:new XML(<div>It was the <i>best</i> of times.<br/>It was the <i>worst</i> of times.</div>)};

which could then be accessed using an E4X engine:

echo(record.signature.i[0])

// <i>best</i>

while a non JSON parser would at least recognize that it's a string that could be parsed if xml() became convention:

var node = JSON.parseXML(record.signature);

Any brackets within the JAXON XML stream would be treated as text, and not evaluated. All it would take is the establishment of a convention on the XML side.

> There's a common profile of XML that most people use that use 80% of
> the features, and these work just fine, but you can expand out to the
> remaining 20% of features now if you need them. The problem is that
> the TOOLS for working with XML in the browsers suck. Long, complex
> DOM name calls, XPath requiring six objects to do anything useful, an
> eleven year old implementation of XSLT with incomplete support. Provide
> better tools in the browser, either E4X, XQuery in the Browser or
> some similar language, and people who hate XML for legitimate reasons
> (it's a pain in the butt to work with) will not have that argument.

Not just in browsers, but everywhere else too. �XML's data model is inherently
much more complicated than anybody needs. �By streamlining the data model
to elements with a name, an attribute map, and a child sequence you get
way past the 80/20 point with much greater ease of use.

Problem is that it's not always the same 80/20. For HTML, namespaces are awkward and irrelevant. For what I do (working with large governmental schemas such as XBRL, NIEM, Premis and so forth), they're essential. Any place where I need to provide validation of mixed schemas, such namespace separation becomes critical. Quoting attributes or even using attribute labels with no associated value is not as major, but does require that a schema exist that can identify a default value at some point, because a map is, by definition, a name/value pair set. Comments are useful for any number of reasons. Entities - personally, I think that if you expanded the list of valid entities to the HTML set by fiat, not a lot of people would lose sleep over them. On the other hand, I think that there is value to expanding the notion of XML to include sequences and other features that are part of the XDM; so long as we have a consistent mechanism for identifying such a sequence, there's no real reason why an XML parser couldn't create a corresponding array of nodes rather than just one (certainly XPath2 recognizes this concept).

As I see it, the XML-prime laundry list from the HTML side looks something like:�

to redefine the implementation of namespaces so that they can be eliminated if necessary,�
recognize that there is a no distinction between <foo></foo> and <foo/>,�
define a broader set of entities,�
eliminate all but perhaps three universal PIs (or change the notation on PIs in general),�
cut the chord on DTDs,�
loosen attribute notations (e.g., <foo selected> would be valid) and�
established a containment rule within the schema that would define HOW an opening tag with no closing tag, such as <br> or <p> �would be interpreted).�

I may have missed an edge case in there perhaps, but that seems to be the crux of the requirements after two months of discussion. It would not be backward compatible (XML-prime would not be parseable with XML 1.0 parsers). In addition to that, I think there are a few other nice-to-haves that might be worth thinking about

create a non-XML notation that would indicate a sequence of atomic entries, among them XML objects, so that we can work with sequences of content, not just documents
find a better mechanism for handling specialized characters rather than entities (I personally hate &),
find a better mechanism for handling white space

Given my druthers, I could live with the above, most of the changes involved would be a reflection of what is increasingly common, accepted usage. It would require that the schematic models be changed somewhat to reflect the looser notation on elements and attributes, and it would take a while to propagate through the toolsets, but its doable.�

However, that would mean that the HTML working group would have to concede that HTML is in fact an instance of this new XML-prime, and that SOME form of mechanism would need to be incorporated such that the above could be incorporated into the web browser space. Maybe it's an E4X, maybe it's better XPath support, maybe it's XQIB, but I think what the XML side is looking for is a way to consistently unify the two standards so that server side developers working within an XML context are not going to be tripping over HTML.�

Kurt Cagle