Re: [xml-dev] How to design XML to have broad utility and yet alsoenable

Knuth was wrong about avoiding premature optimization. Because there is now a generation of coders who take him to mean that you should initially ignore performance or implementation heuristics or real-time aspects and concentrate on correctness. The result is systems that fall over in certification or integration testing.

In fact, team leaders need to be constantly vigilant that their developers dont suddenly stick in code of such inefficiency that it compromises the project: in Xml systems, the one i see most is where the developer cannot figure out how the stream api works,� so they save the document out to the file system (potentially getting encoding wrong then) because thst is easier. This is nothing to do with correctness: they are choosing an inefficient and error-prone method over one that may be less obvious but safer and nonwrecking. The coder thinks they are avoiding premature optimization, but they are just being lazy.�

Indeed, many projects run late: the time where optimisation should have been done never occurs.

Of course,� I dont actually disagree with Knuth (who is not infallible: markup is not a fad, but who is great),� but most systems have realtime constraints and you should start a design and implementation with a feasible solution, not an infeasible one.� When you need a sort, for example, you dont start with using a linear sort� just because it is the simplest and anything else would be premature and you will do testing later: no, if you judge it to be significant you start off with the most feasible sort which is low-hanging fruit in the libraries available.

Some performance characteristics are so well known to experienced devdlopers that they dobt treat them as optimisations, imho. For example, mentally tracking the polynomial order of some algorithm, or knowing that if you have a 128m xml file for config values, you probably need to trim and cache it, not reload it. This is not optimisation, because the real system wont work otherwise.

An optimisation is something that helps you exceed your quality constraints; a design is what helps you approach those constraints in the first place.

What about optimising data models then? Are cardinality and order constraints a premature optimization sometimes?

On 21/11/2013 2:33 AM, "Peter Hunsberger" <peter.hunsberger@gmail.com> wrote:

Hi Rick,

I think if you have a data model in mind then you have at least one application that you are expecting will use that data model. �So, yeah, I'd agree, but with the caveat that data models may have broader coverage than single applications and the trade off between generalization for use with many systems and optimization for use with a few systems isn't always as straight forward as one might hope. �Again, most of my focus is at the "enterprise" level where reuse is emphasized, but my continuing guidance when one attempts to justify an application specific interchange format is always "avoid premature optimization". �In particular, the human factors of having to maintain many simple specs versus a few more complex specs may be a one time cost or an ongoing tax on the organization. The one time implementation for the generalized, �but complex, spec might win or loose when it turns out it can't be reused after all... � Ideally, we aim for broad scope and low complexity (at least relative to the problem domain), but generalized data specifications with low complexity�(and the systems that handle them well)�are rare beasts indeed. �When they do show up they have quick uptake. �That's sort of the war between XML and JSON, though at the syntax level as opposed to the interchange of specific sets of information...

Peter Hunsberger

On Tue, Nov 19, 2013 at 8:00 PM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

I'd go further than that: I'd say always optimize for at least one application (or source or platform etc) where it is known. -- At least you first look at this to see if it is appropriate.
The documents have to be good at something rather than good for nothing. If it can cope with one specific use, it may cope with others (as far as completeness etc).
The intent is that you only need glue code on one side of the exchange rather than on both. And your documents can then share the documentation of one side, rather than multiplying analyses.
People already do this a bit: no-one invents New table languages for example.
I think this is sometimes a major flaw in the utility or theory of namespaces: that you want standard vocabularies made independently of input or output (ie application specific information).

Cheers
Rick

On 20/11/2013 2:08 AM, "Peter Hunsberger" <peter.hunsberger@gmail.com> wrote:

David,

that's a good point. �If the data interchange is between two points where you have knowledge of the requirements of both then, yeah, optimize for that interchange. �And again, that may be multiple programs or applications....

I got distracted by Rogers use of the world "models" and being stuck on data modelling a lot these days that's the only thing I focused on. Time for more coffee...

Peter Hunsberger

On Tue, Nov 19, 2013 at 8:52 AM, David Lee <dlee@calldei.com> wrote:

I argue that regardless of if the XML is for "data" or "documents" ( whos intersection is not the empty set, IMHO),

that in fact the XML model can be quite coupled to the application and may need transformations even if the data is generic or abstract.

�

A simple example.

Very frequently large datasets are stored in XML documents as horrendously large single documents with vast replication of one or more child elements like

<root>

�� <data> ...

�� <data> ...

�� ... a million times over

</root>

�

�

This is very convenient for some applications but horrible for others.

For example this may produce a file simply too big to read into some applications or is non ideal for an XML Database.

But it may be ideal for streaming processing, file transfer and packaging.

Transforming this file into different formats (say for example splitting it into a million smaller docs) may be better for some applications.

Similarly simple transformations may help with some applications such as combining fields, moving attributes elements or visa-versa ...due to pecurlaritities

of the application.�� You may even be able to translate "Proprietary Schema A" into "Universal Schema B" so that Application� B can read it.

�

So I conclude that it is not the case that universally "data XML" should tuned for a single application and not translated.

Not all applications (in both senses of the word) of Data are alike even if the underlying data is alike.

�

�

�

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org

�

From: Peter Hunsberger [mailto:peter.hunsberger@gmail.com]
Sent: Tuesday, November 19, 2013 9:38 AM
To: Costello, Roger L.
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] How to design XML to have broad utility and yet also enable efficient application processing?

�

�If XML is being used for document interchange then your XML design is predicated by the document formatting and content you wish to capture in your document. �However, you seem to be aiming this at the data interchange world? If so, the question is the question you seem to be circling around is; should XML data interchange formats directly reflect the data models they are transporting? �Given that this is data and not, therefore, an end product intended for humans I'd vote that the XML design should come after the data models are optimized for their various business purposes. �The XML will then hopefully be as efficient of serialization of those models as possible. Note that, in my opinion, good data models are also not optimized for, or specific to, any one program. �As I've noted before, _good_ data models span enterprises, least of all individual programs...

�

As to your last question, I certainly don't think applications should transform XML into forms that make it inefficient to process (duh)!

�

�

Peter Hunsberger

�

On Tue, Nov 19, 2013 at 4:41 AM, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

Liam Quin wrote:

� � � � XML frees your information from being
� � � � optimized for, and specific to, any one
� � � � program.

An obvious question arises:

� � � � What is the right way to design XML?

I believe the answer is:

� � � � Design XML so that it reflects (models)
� � � � the real world.

But real-world designs (models) may not be well-suited to efficient application processing. A second question arises:

� � � � How do we design XML to have broad
� � � � utility while still enabling efficient
� � � � application processing?

I believe the answer is:

� � � � Each application should transform the
� � � � XML into a form that enables the
� � � � application to process it efficiently.

Do you agree with this?

Is there anything else you would add?

/Roger

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

�