Hi Folks,
First, what is normalization? Answer: it is removing redundancy.
For example, in the following XML document the element <Author>Dick Grune</Author> occurs twice:
<BookCatalogue>
<Book>
<Title>Parsing Techniques</Title>
<Author>Dick Grune</Author>
</Book>
<Book>
<Title>Modern Compiler Design</Title>
<Author>Dick Grune</Author>
</Book>
</BookCatalogue>
Normalizing the XML document means eliminating the redundancy, perhaps like so:
<BookCatalogue>
<BooksByAuthor author="Dick Grune">
<Title>Parsing Techniques</Title>
<Title>Modern Compiler Design</Title>
</BooksByAuthor>
</BookCatalogue>
Across the Internet chatter we often hear people proclaim:
After all, a key part of designing database tables is to put them in “Normal Form”.
With all that normalization hype everyone then assumes that everything needs to be normalized. However, that is not correct.
The benefit of normalizing only applies when adding/deleting/modifying data. When the data is read-only (as is the case with a data exchange format) there is no benefit to normalizing. In fact, normalizing is the worst thing that you can do. Consider this: read-only databases are typically not normalized, they contain lots of redundant data so as to optimize read operations.
Should a data exchange format be normalized? Answer: No!
Comments?
/Roger