xml-dev - Re: Internal subset equivalent in new schema proposals?

Re: Internal subset equivalent in new schema proposals?

[ Lists Home | Date Index | Thread Index ]

From: Paul Prescod <paul@prescod.net>
To: xml-dev@ic.ac.uk
Date: Fri, 27 Nov 1998 09:30:33 -0600

Michael Kay wrote:
> 
> While Paul Prescod asserts:
> >SGML and XML are explicitly about organizing information for machine
> >processing. So according to your definition, SGML is about data, not
> >documents.
> 
> Well, I know that SGML folks have always seen SGML as the solution to
> everything, 

Your presence here is evidence that we were correct.

> but I don't think they've erased the evidence of where it came
> from, which was device-independent typographical markup.

SGML was always about non-typographical markup. It was always about
treating documents *as data*.

> If you were to design something for processing data (not just rendering it),

Rendering data *is* processing it. There is no distinction there.

> you'd have support for the kind of data models recognised in the database
> world; you'd have integrity constraints (including data types and primary
> keys); 

But SGML is not a data modelling language -- it is a language description
language. The world has many data modelling languages -- some of them very
good. It has only one language description language at XML's level. It is
highly debatable whether it makes sense to have a single schema for both
the language (the serialization) and the data model.

In fact, I'll go so far as to say that it does NOT make sense. SGML goes
*too far* in this direction: the ID/IDREF mechanism should be treated at a
separate level, like other integrity constraints.

> you'd have declarative query languages and report writers rather than
> navigational APIs and style sheets; 

Neither SGML nor XML say anything about navigational APIs or stylesheets.

> you'd have relationships rather than hyperlinks; 

"A hyperlink is a typed relationship among two or more objects, each of
which fulfills a unique role in the relationship."

"The term "hyperlink" is used in preference to the unqualified term "link"
to avoid confusion with the SGML processing link feature. However, the
term "link" can be used with more restrictive qualifying adjectives, as in
"hypertext link", or with no qualifiers when the context is clear."

http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-8.1.html

> and (to be trivial about it) you'd have examples in the domain
> of customers and orders, not books and poems.

Where? We have examples from both domains in the book I co-wrote. I focus
on document structures, however, because they form a superset of
relational data structures. I could show a database of "dates" but I can
get double mileage by putting the date in a letter or memo. Also, the
first rule of example writing is to try and do something concrete.
Databases are abstract to most people. Documents are concrete.

> It so happens that XML is better for doing application-to-application data
> interchange than things (like ASN.1) that were supposedly designed for the
> job, so I don't begrudge it. 

It doesn't "so happen." Charles Goldfarb set out to solve a bigger, harder
problem and (not surprisingly) came up with a more general, flexible
solution.

> But XML (including its friends and relations)
> is full of things that wouldn't be there if that were its original primary
> purpose, and I really don't see how one can claim otherwise.

There is no doubt that this is the case. I've tried to be explicit that
SGML and XML solve a superset of the relational/object database
interchange problem. If it had been designed for database serialization,
it wouldn't have a concept of "mixed content" (for example). My only point
in this discussion is that specific flaws that you have described cannot
be traced to a document-orientedness. They are just flaws. 

Nobody has yet shown me a sort of data that does not appear in documents
and does not need complex processing in that context.

This is an important issue because it informs how we should go forward. In
my experience, any solution which solves the document world's
serialization problems will trivially solve the database world's
serialization problems. (Once again) That's why we are having this
discussion today. But some of the schema proposals (like DCD) involve
solutions to problems that DO NOT SCALE to complex problems. They will
solve the database world's problems for a time, but they do not solve the
whole problem.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself.
 http://itrc.uwaterloo.ca/~papresco

Christmas shopping in a T-Shirt? Toto, I have a feeling we 
aren't in Canada anymore.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: Internal subset equivalent in new schema proposals?
  - From: John Cowan <cowan@locke.ccil.org>

References:
- Re: Internal subset equivalent in new schema proposals?
  - From: "Michael Kay" <M.H.Kay@eng.icl.co.uk>

Prev by Date: RE: Internal subset equivalent in new schema proposals?
Next by Date: Re: Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)
Previous by thread: Re: Internal subset equivalent in new schema proposals?
Next by thread: Re: Internal subset equivalent in new schema proposals?
Index(es):
- Date
- Thread