RE: Another "Against the Grain" column on XML

That's a good point. Even very simple business "documents" (invoices, catalog entries, etc.) can easily run to 20-25 tables when fully normalized. I can't even begin to imagine what the DocBook schema would normalize into! (And remember, we're talking about a critique from a relational purist who doesn't like the various "post-relational" techniques to minimize normalization complexity any more than he likes XML!)

Furthermore, real "documents" tend to throw the relational model other curves, such as mixed content and recursive content models. The "Professional XML Databases" book goes into some detail on how to represent such things in relational tables, but after stepping back to examine the complexity of the process and the difficulty of the results, pretty much concludes "don't DO that", i.e. try not to use mixed content or recursion in schemas for data you want to store in an RDBMS. Recursive content models (think of the classic "bill of materials" example) can be relatively easy to normalize, but extremely difficult to query with SQL. (Pascal has a whole chapter on this subject, and basically argues that a hypothetical pure relational query language could handle this better than SQL can .... gee, thanks for the helpful advice!) If you don't have the luxury of simplifying the schema to eliminate mixed content and recursion, I guess you've got problems (and <self-serving-plug> you probably want to look at a native XML DBMS </self-serving-plug>).

I like the Turing Machine analogy ... sure you can formally model any program as a TM, and you can model any document in 3rd normal form ... but how often does either fact give you practical guidance for building a real system?