Re: [xml-dev] Reflecting on a decade of XML: Lesson Learned

(The following is a repeat posting, that somehow did not survive translation from Word to Yahoo mail to xml-dev for my Firefox, Chrome or Safari browsers. IE8 is okay somehow.)

In a message dated 12/27/2010 5:08:50 P.M. Eastern Standard Time, billclare3@yahoo.com writes:

In one sentence – an early flurry of unreconciled objectives have missed fundamental simplifications, and, perhaps, the process has mired in details.

So here - a holiday grab bag of comments on objectives and simplifications, including base capabilities for:

· separating “what is” specifications from “how to” model implementations

· foundations for a uniform syntax separate from semantics

· separating application data from specification data

· separating markup from other data.

It would seem that attempts at a u-XML and at XML standards simplifications need clear statements of objectives in order to decide what features to include or exclude.

If, for u-XML, advantages for users is a primary objective, problems with u-XML include:

· If an application is concerned with just text, many would use PDF, Microsoft Word or some other capability.

· Secondly, most Web applications need to deal with data that exists in many forms and representations, of which text is a small subset.

· Thirdly, most uses of the u-XML data would still require the complexities of other XML based technologies, in particular XHTML and probably XSLT. Thus any objectives of simplicity for application developers are only partially achieved.

If, for u-XML, parsing simplicity is a primary objective, which seems to be mostly what it’s about, then defining a subset for special uses (e.g. limited devices, very large documents) might be useful, but it shouldn’t add complications. However, I’m not sure that parsing simplicity is a major seller.

If simplification of and capabilities for application development(which is what XML and related standards are actually used for) is a basic objective then the following discussion seems relevant.

Simplification can occur basically in two ways – which are not incompatible.

· Removing awkward features and complications – which diminishes capability

· Generalization and building upon fundamental abstractions – which can greatly enhance capabilities.

Relative to u-XML:

· XML is designed to deal for text markup.

· For markup, u-XML is a basic simplification.

For specifications, JSON and YAML like approaches are quite different simplifications, but allow for data typing.

· From this perspective, “text markup” can viewed as a basic type - for which angle brackets, elements and attributes are well suited. (Basic text does not need angle brackets to be escaped.)

This can allow advantages of u-XML and a JSON like language to be combined where useful.

Relative to specifications, SGML heritage is a hindrance. If one accepts that a specification language has fundamental differences from a markup language, then it is probably useful even to start with a new vocabulary (still, with translations for compatibility in mind):

· “Elements” are either data or data types (with Infoset like representations).

· “Attributes” are properties, either:

¨ specific properties for a data type, or

¨ general properties (presentation, storage, metadata, events) applicable to many data types.

· “Documents” are specification streams (e.g. files)

· Delimiters are used for nesting, grouping, lists, comments.

· Some use of “key words” is probably appropriate (e.g. inheritance) .

Several simplifications (with admittedly some oversimplification here) seem obvious to begin with:

Basic structure and syntax structure (actual syntax details are another issue) should allow for:

· Specification statements consist of:

¨ name value | expression – for data content

¨ name type specification - for schema and other declarations

¨ name reference | reference_expression - for templates

(These statement types can be combined in a single specification file where useful.)

· Data types can include properties and behavior, and methods can be implemented in reference libraries.

· name(size) and name(index) can be used for arrays of data.

· name*, name+, name(minSize, maxSize) can be used in type specifications.

· Names can be prefixed with external name scope identifiers ( including resource identifiers, such as URI’s), which are possibly aliased.

· Names can be imported (and possibly aliased) from other name scopes with using specifications.

· Expressions can include:

¨ Data value expressions for arithmetic, comparison and logic expressions.

¨ Reference expressions including path names. Path steps can follow any reference value.

¨ Type expressions can include structures and also EBNF, along with constraints and defaults.

Choice would be indicated with a bracketed list with “|”.
Grouping would be indicated with a bracketed list with “;” separators

Ordering, if needed, could be specified with a decorated bracket.

· Parameters are values that can be provided in specifications or derived from data content, and allow for tailoring and adaptation of data and specifications.

This includes conditional substitution and evaluation, and use in expansion of templates.

· Modules can be used for reference, substitution, inheritance (extension, with polymorphism, restriction, and overrides), and transformations.

· Module hierarchies, including:

¨ Conceptual models (typically, for standards)

¨ Abstract models (typically, for application frameworks)

¨ Concrete models (for implementations, typically tailorable).

From this powerful application development capabilities can be created with models for

· presentation (ala HTML, OpenDoc, XSLT,etc.)

· data structures (RDF, XQuery, etc),

· communications (XForms, Apache), and

· control( events, actions, SCXML, workflow, etc.)