That's a good point. Even very simple
business "documents" (invoices, catalog entries, etc.) can easily run to 20-25
tables when fully normalized. I can't even begin to imagine what the
DocBook schema would normalize into! (And remember, we're talking about a
critique from a relational purist who doesn't like the various "post-relational"
techniques to minimize normalization complexity any more than he likes
XML!)
Furthermore, real "documents" tend to throw the
relational model other curves, such as mixed content and recursive content
models. The "Professional XML Databases" book goes into some detail on how
to represent such things in relational tables, but after stepping back to
examine the complexity of the process and the difficulty of the results, pretty
much concludes "don't DO that", i.e. try not to use mixed content or recursion
in schemas for data you want to store in an RDBMS. Recursive content models
(think of the classic "bill of materials" example) can be relatively easy to
normalize, but extremely difficult to query with SQL. (Pascal has a whole
chapter on this subject, and basically argues that a hypothetical pure
relational query language could handle this better than SQL can .... gee, thanks
for the helpful advice!) If you don't have the luxury of simplifying the schema
to eliminate mixed content and recursion, I guess you've got problems (and
<self-serving-plug> you probably want to look at a native XML DBMS
</self-serving-plug>).
I like
the Turing Machine analogy ... sure you can formally model any program as a TM,
and you can model any document in 3rd normal form ... but how often does either
fact give you practical guidance for building a real
system?
|