Today, we will review database normalization. Everyone familiar with RDBMS is probably also familiar with database normalization, but it is good to review it before we embark on database design.
Normalization:
Normalization is a way to arrange your database schema in such a way as to minimize data redundancy and duplication. For example, if you have two tables in your database - Customers and Orders. If your database is not normalized, you may store Customer Name in both Customers and Orders tables to relate the order to a customer. Apart from storing the same information twice, you also have a real problem where data could get out of sync. If your customer gets married and changes her last name, you must update her name in both tables. One of the key strength of relational databases over flat flies such as text file is their ability to relate the data across tables without having to duplicate it, hence the term "Relational".
Types of Normalization
The inventor of relational model, Edgar F. Codd defined the first, second and third normal form or 1NF, 2NF and 3NF respectively. Later on, Codd and Boyce defined the Boyce-Codd Normal Form (BCNF) also called 4NF. 5th and 6th normal forms (5NF, 6NF) were defined later on. Generally, most database adhere to 3rd Normal Form and in most cases a 3NF will also adhere to 4NF and 5NF (but not always).
When you are designing a database, care should be taken to design your database as normalized as possible. However, normalization comes at a cost, and you may have to selectively de-normalize a few tables. In some schemes such as data warehousing, you keep your design denormalized for performance reasons and primarily because you are typically not updating the data in a data warehouse.
A well normalized database not only reduces the data anomalies and redundancies, it also makes future modifications to the database easier. For example, let's assume you have a table that stores 4 phone numbers and all numbers are stored in columns of a single row, i.e. one column for each phone number. What would happen if you have to now record 5th phone number? You have to modify your table and add a 5th column to store the 5th number and since not all users will have 5 numbers, you will have a null values in several columns for most of your users.
First Normal Form (1NF)
1NF has two rules - First rule says that we do not duplicate data in the same row of a table. Recall the above example, i.e. you store 4 phone numbers for every user in 4 columns of the same row. But not all users will have all 4 numbers resulting in lot of null values (hence duplicate data). Adding a 5th phone number will require table schema modification. So how do we solve this? Well, you can create a table called PhoneNumbers with two columns
- Name
- PhoneNumber
Now you can have as many phone numbers per user. But, this is not 1NF yet. The second rule states that each row in a related table should have a unique (primary key). You could say that we can make PhoneNumber a primary key. But what if the same phone number is shared by two users? How about making Name and Phone Number (composite key) as a primary key? Well, close but what if two users who share the same phone number happen to have the same name? To make this table 1NF, we have to have a truely unique key. How about adding an identity field and making it a primary key? That will ensure every record has a unique key and will make our table 1NF.
In next post, we will discuss Second Normal Form (2NF).
Thank you.
Thank you.
No comments:
Post a Comment
As always, your comments are welcome and appreciated!