Normalization in Action: Building Better Databases

What is Database Normalization?

Database normalization is an approach to evaluate database structure and minimize data redundancy based on candidate keys and functional dependencies. Candidate keys are unique identifiers that have all the necessary conditions to become a primary key. Functional dependencies refer to the relationship between attributes, where one attribute uniquely determines another.

Simply put normalization is a process of eliminating attributes which is not identified by candidate keys.

For example, consider a studentnumber that can uniquely identify a surname and firstname. In this case, the student number acts as a determinant, which determines the surname or firstname.

Student (studentnumber, surname, firstname)

The goal of normalization is to group attributes with a close logical relationship into the same table, so that it ensure there is no data redundancy.

Benefits:

Efficiently retrieve data: Well-structured data allows for faster and more accurate queries.
Avoid Complexity: Normalization simplifies the database structure by breaking it down into smaller, related tables.
Eliminate Duplicates: Redundant data can lead to anomalies (insertion, deletion, and update issues). Normalization helps minimize these.
Consistent Data: By organizing data, normalization ensures consistent values across related tables.

What Define Good and Bad Database Design?

Good:

Tables doesn't have data redundancy
Each determinant is a candidate key
Each determinant can uniquely identify other attributes

For example coursecode and offeringnumber can confidently identify other attributes CourseOffering (coursecode, offeringnumber, startdate, teachernumber)

Bad:

Relation/table has undesired data redundancy
A determinant is not a candidate key but can still identify another attribute in the table

For example teachernumber is not a candidate key but it can identify surname

CourseOffering (coursecode, offeringnumber, coursename, startdate, teachernumber, surname)

Partial dependency exists

(empno, clubno, clubname, joindate)

In this case empno and clubno are together a primary key and it can determine clubname, however clubno can also identify clubname, this leads to redundancy since clubname is dependent only on a part of the primary key.

Types of Normalization:

First Normal Form (1NF):

Each cell holds only one value (atomicity).
Primary key for identification.
No duplicated rows or columns.
Each column has a single value for each row.

Example: A table with employee details, where each employee has a unique ID and no duplicate rows.

Second Normal Form (2NF):

Builds on 1NF.
Ensures that non-key attributes depend entirely on the entire primary key (no partial dependencies).

Example: Separate tables for employee data and their projects, linked by employee ID.

Third Normal Form (3NF):

Builds on 2NF.
Eliminates transitive dependencies (attributes indirectly depend on other attributes).

Boyce-Codd Normal Form (BCNF) The redundancy prevention tool

Builds on 3NF.
Each determinant is a candidate key
All attribute values are atomic (single values)
If an attribute is not part of a candidate key, it must have a determinant

Hung Hoang

Search This Blog