SPS Home    >   Dgreath    >   RDBMS    >   Database Denormalization

DATABASE DENORMALIZATION

The process of denormalizing relational database (RDB) tables entails partial recomposition of the normalized table to improve system performance. The principal concern with denormalization  will always be the maintenance of the redundant data. It is critical that this data remain valid to preserve data integrity. It is also possible that update performance will be degraded.  For these reasons denormalizing should not be done until the database has been physically implemented and tested under full load conditions and it has been determined that performance is unacceptable.
 

Denormalizing Methods

Adding Redundant Columns -- Simply copy one or more columns from a child table to the parent table to eliminate a join. Create a trigger procedure to update the duplicate columns anytime the child columns change value.
Adding Derived Columns -- Create a column based on data computed from other columns, either in this table or from other.  Most relational database applications will automatically compute the values whenever the column is queried, so updating is fairly automatic. Carefully constructed trigger procedures can improve performance.
Vertical Partitioning -- Split a table into two tables placing mandatory and frequently accessed columns in one table and optional and infrequently used columns in a second table. Performance is improved because the secondary table will only be accessed when needed, so the processor has less data to manipulate.
Horizontal Partitioning -- Split a table into two tables placing recent rows in a current table and older rows in an archive table. Transaction processing on the current table should improve and a view can be created to join the tables for full database queries.
Prejoined Tables -- Create a table consisting of all joined columns that is periodically updated. Queries are run on the new table rather the underlying tables. Queries will not be real time but in many cases, once a day or once a week may be all that is needed.
Report Tables -- Create a table containing just the columns required for the desired report. Queries can be run to select and sort rows as needed. The table will need to be periodically updated.
Mirror Tables -- Create multiple copies of a table to distribute the load. A mechanism will be needed to maintain the mirror tables.
Repeating Groups -- Add redundant columns to eliminate duplicate rows. This can consume a great deal of storage space but can reduce the number of rows to be processed.
Speed Tables -- Hierarchically arranged data can be stored in a speed table to simplify processing and coding. In a speed table, a column is added that indicates the row's parent row. A recursive query can be created to reconstruct the original hierarchical structure.