OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data Model(s) for XML 1.0 / XML Devcon / DOM / XSL / Query



Thanks for you response Tim.

[Tim Bray]

>... and so on.  Sean makes some good points, particularly
>that a lot of hair on the DOM is due to the requirement
>that it support authoring applications.

"Hair on the DOM" I love it:-)

As Tim (and others on this list) know, I'm a practicing,
fully paid up simpleton. Joe English talked about
ESIS in the context of SGML and yes, *I* was that
programmer. SGML was infuriatingly nebulous
to me until I saw ESIS and then the lights went
on.

Unlike some greater intellects,
my mental apparatus does not feel compelled
to go beyond this into the world of abstract
infosets and APIs for same.
Given that so many of us SGML heads
lived on the stuff, I admit to being surprised
that ESIS (or the XML-ised version if it
I dubbed PYX ((http://www.xml.com/pub/a/2000/03/15/feature/).)
Is not more popular in the XML world.
Guess I'm out of touch...

I cannot count the number of high profile
SGML shops I  consulted with,who, when
you peeled the layers off, based their
work on ESIS.

For read-only work, the data model issue was
simple, if the concept was represented in ESIS, you
worked with it, if it wasn't you didn't.
ESIS was the fodder that fed a generation
of tree builders and event APIs. The DOM
and SAXes of their day.

[...]
>   It took me years to realise how deep and important
>   the divide is between wanting an SDK and wanting to know
>   the underlying protocol. Too much of our biz can only see
>   one of these realities. I grew up with networked
>   minicomputers and (mostly) Unix, and maybe that's why, in
>   the final analysis, I always want to see the bits on the
>   wire, because in the final analysis, given any programmable
>   device, I can  work with them.
>   XML is of course the ultimate expression of that philosophy;
>   it can do a reasonably good job of offering a bits-on-the-wire
>   view of just about anything.

Yes, but the syntax-only nature of it costs you the ability
to know the details of the DAG the programmer on
the far end of the wire will see. With SGML, I sent you
documents and we agreed to parse them with
NSGMLS. Straight away, we had a common shared
understanding of the data model we had to work with.

One of the strengths of XML often cited over
HTML is the consistent nature of the result of
the parsing process. This turns out to be never
true because of the dichotomy between validating
and non-validating parsers. Now with schemas and
the PSVI, it is even less true.

>   During the heydey of client-server I was repeatedly baffled
>   and frustrated by the mind-set, in particular evidence chez
>   Apple and Microsoft, that the only expression of computing
>   reality was a big hairy complicated API with an associated
>   big hairy complicated (and often expensive) SDK.

In violent agreement with this. This is why ESIS pushed
my buttons so well for SGML, it was *obvious* what
an API to the stuff would give you access to and what
it would not let you worry about by omitting it.
[...]

>   These days I write big complicated software in Java, which
>   does a good job of giving you a tractable object model
>   overlaying insanely complex infrastructure. But in a
>   distributed int[ra,er]-net scale app with heterogeneous boxes,
>   there's still no substitute for the bits on the wire.

Again, violent agreement. All I want beyond what we currently
have, is the ability to know for sure what the software on
the far end of the wire will see in its DAG. A minumum
contract DAG between me as producer and he/she
as consumer (and visa versa). I would like to see
this minimum DAG named and used as a basis
on which post-parse validators, query engines,
database APIs and formatting engines can
be based.

[...]
>And another reason that Sean is wrong is that it's taken,
>in aggregate, in excess of 5 years to shake out the DOM and
>the infoset, and if we'd held off on XML until we had a
>consensus data model, we'd still be waiting.

Ok. But time moves on, the problems are starting to bite
as evidenced by the irksome differences between the
data models of key XML technologies.

I would suggest that the right 80/20 point to move
forward with the phenomenon that is XML, is to
tie down read (logical) and a read/write (physical)
infosets to provide a firmer foundation for the
burgeoning family of XML plumbing based
technologies.


>At the end of the day Henry is right, you really need the
>infoset for the same reason that SGML needed groves and
>property sets, so that you can define higher-level protocols
>like XPath XSL and XQuery and so on in a nice clean way.
>So, spec writers need this apparatus.

I dunno. <duck>groves and property sets
are a tremendous intellectual achievement in a
superstring-theory-of-everything kinda way, but
they did not set the SGML world alight and
I doubt they will set the XML world alight.</duck>

<Dig_Trench_And_Hide_In_It>
I think of it this way. There is an abstract model for
the inter-relationships between all the aspects of
Tolkien's Lord of the Rings. It is way cool to
grok it, and understand at a deep level how
it is all inter-related.

But it is still a fantasy world.

Groves, Property Sets etc. are a great abstraction
for understanding the man made complexity
of SGML and XML. But we should remember
that this complexity is man made - it is not
intrinsic to the problem being solved. It is certainly not
a pre-requisite for building compelling
XML applications.
</Dig_Trench_And_Hide_In_It>

>   Does the actual
>*programmer* need to think about it?  Usually not; the
>typical programmer's world-view is either that given by
>some SDK (with access to SAX and/or the DOM) or in a really
>heterogeneous system, bits-on-the-wire.

As a simpleton from the SGML days, my world view
is based on a post-parse data model. In de-facto
SGML read-only processing, there was
one - as represented in ESIS - in XML there are
multiple (with more coming).

When teaching XML I find myself feeling like a
party-pooper when I point out all the gotchas
behind the statement "they send me XML, I want
to do X to it, and perhaps send XML on to the
next guy."

What keeps me going through all the standards
and all the data model mismatches and one-size-fits
all XML APIs is the hope that it will all work itself out
in the real world.

Perhaps syntax-only XML is the right way
to do it. Goodness knows many key people
in the community believe it. Perhaps this
allows the market in data models to find its
own level by a process of natural selection. Perhaps.

On the other hand, I fear the
per-layer-infoset-is-king-cos-XML-is-just-syntax
will make the whole base family of XML standards
intractable without the property sets/groves abstractions.

The simpleton in me refuses to believe that
the world should be or needs to be this
complicated.

regards,
Sean