OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data design methods? (was Re: APIs, messaging)



Thomas B. Passin wrote:

> Several.  Remember that in the (relational, at least) world, we have
logical
> and physical (and maybe conceptual too) data models.
>...
> What plays an analogous role in xml data model approaches?  In xml data
> modeling, people tend to dive right in with instances and physical
schemas.
> Maybe this isn't the best approach.

We had the exact same distinction between logical and physical design in the
XML world, and certainly in the SGML world: element type content models, and
to some extent, attributes, described logical structure and entities (in the
XML sense of the word: a named unit of storage defined in a DTD by an entity
declaration) described physical structure. For a large, complex document,
you didn't keep it in one big file, but broke it down into pieces based on
which pieces had to be shared by other documents, which did/didn't need to
be updated at whatever frequency, and of  course, on operating system and
processing program efficiency considerations.

It seems like people don't do this as much anymore. I see two reasons for
this: first, when more information was distributed via CD ROM, we were
dealing with bigger files, so the best way to break them down was a bigger
issue. Now that we're dealing with files that get sent over the Internet and
are typically much smaller than a meg, it's not as necessary.

The second reason was that a lot of people just didn't like entity
declarations. One of the complaints about DTDs that inspired some of the
schema proposals was that DTDs defined logical and physical structure in the
same place, which is not the cleanest way to describe a complex system.
Unfortunately, the most common solution to this problem was to just blow off
physical design issues when giving developers a way to design their document
type; fortunately, XInclude is now in Last Call status, so these physical
issues can still be addressed from a schema-based system.

> For relational databases, we have various degrees of normalization and we
> know that the logical data model should be in at least 3rd normal form.
> That's another metric.  Normal forms are about making different data items
> and structures orthogonal to each other and about reducing redundancy.  It
> would be interesting and valuable to look at xml data structures to find
out
> how to achieve comparable goals.

This all works great for information that fits well into tables, but for XML
it usually only works for data that started off in tables to begin with
(e.g. FpML data). For information that doesn't, it's a problem--if you did
it with DocBook, you'd have a separate table for all your emphasized words
and phrases.

Because of the lack of a series of straightforward normalization steps, DTD
and schema design has been considered a black art much like OO design. In
fact, the tools of OO design have helped out here; Addison-Wesley's series
of UML books has a new one titled "Modeling XML Applications with UML:
Practical e-Business Applications."

> All these things (and more, including coherence) go into making a good ER
> data model.  They are all independent of the processing algorithms.

Mr. ER himself, Dr. Peter Chen, is actually on the XML Schema Working Group
and has been giving a talk at the last few XML DevCons about ER's affinity
for XML. The main thrust of his talk addressed the value of the "R" in "ER
modeling" and how ER modeling provided a good formal basis for designing
information structures using RDF or XLink, where modeling of relationships
is so important.

Bob DuCharme            www.snee.com/bob             <bob@
snee.com>      see http://www.snee.com/bob/xsltquickly for
info on upcoming "XSLT Quickly" from Manning Publications.