Skip to main content

Normalization in Action: Building Better Databases

What is Database Normalization?

Database normalization is an approach to evaluate database structure and minimize data redundancy based on candidate keys and functional dependencies. Candidate keys are unique identifiers that have all the necessary conditions to become a primary key. Functional dependencies refer to the relationship between attributes, where one attribute uniquely determines another.

Simply put normalization is a process of eliminating attributes which is not identified by candidate keys.

For example, consider a studentnumber that can uniquely identify a surname and firstname. In this case, the student number acts as a determinant, which determines the surname or firstname.

Student (studentnumber, surname, firstname)

The goal of normalization is to group attributes with a close logical relationship into the same table, so that it ensure there is no data redundancy.

Benefits:

  • Efficiently retrieve data: Well-structured data allows for faster and more accurate queries.
  • Avoid Complexity: Normalization simplifies the database structure by breaking it down into smaller, related tables.
  • Eliminate Duplicates: Redundant data can lead to anomalies (insertion, deletion, and update issues). Normalization helps minimize these.
  • Consistent Data: By organizing data, normalization ensures consistent values across related tables.

What Define Good and Bad Database Design?

Good:

  • Tables doesn't have data redundancy
  • Each determinant is a candidate key
  • Each determinant can uniquely identify other attributes

For example coursecode and offeringnumber can confidently identify other attributes CourseOffering (coursecode, offeringnumber, startdate, teachernumber)

Bad:

  • Relation/table has undesired data redundancy
  • A determinant is not a candidate key but can still identify another attribute in the table

For example teachernumber is not a candidate key but it can identify surname

CourseOffering (coursecode, offeringnumber, coursename, startdate, teachernumber, surname)

  • Partial dependency exists

(empno, clubno, clubname, joindate)

In this case empno and clubno are together a primary key and it can determine clubname, however clubno can also identify clubname, this leads to redundancy since clubname is dependent only on a part of the primary key.

Types of Normalization:

First Normal Form (1NF):

  • Each cell holds only one value (atomicity).
  • Primary key for identification.
  • No duplicated rows or columns.
  • Each column has a single value for each row.

Example: A table with employee details, where each employee has a unique ID and no duplicate rows.

Second Normal Form (2NF):

  • Builds on 1NF.
  • Ensures that non-key attributes depend entirely on the entire primary key (no partial dependencies).

Example: Separate tables for employee data and their projects, linked by employee ID.

Third Normal Form (3NF):

  • Builds on 2NF.
  • Eliminates transitive dependencies (attributes indirectly depend on other attributes).

Boyce-Codd Normal Form (BCNF) The redundancy prevention tool

  • Builds on 3NF.
  • Each determinant is a candidate key
  • All attribute values are atomic (single values)
  • If an attribute is not part of a candidate key, it must have a determinant














Comments

Popular posts from this blog

Maximizing Efficiency: The Power of Database Indexing

What is database performance? There are two main aspects of database performance: response time and throughput . Response time is the total time it takes to process a single query and returns result to the user. It's critical metrics because it directly impacts the user's experience, especially in applications where fast access to data is essential. The response time includes CPU time (complex queries will require more computational power and increase processing time), disk access, lock waits in multiple-user environment (more about database transaction ), network traffic. Throughput refers to how many translations the system can handle per second (TPS). A transaction could include different activities to retrieve and manipulate data. A single command like SELECT, INSERT, UPDATE, DELETE or a series of commands could be used to trigger these activities. If you’re running an e-commerce site, a single transaction might include checking the inventory, confirming the payment, and...

LINQ - Deferred Execution

Deferred Execution means that queries are not executed immediately at the time it's being created. The benefit of this is to improve performance by avoiding unnecessary executions. It also allows the query to be extendable when needed such as sorting, filtering. In the example below, the queries to retrieve courses are not being executed yet var context = new DbContext(); var courses = context.Courses      .Where(c => c.Level == 1)      .OrderBy(c => c.Name); The query is only executed when one of the following scenarios occurs: Iterating over query variable Calling ToList, ToArray, ToDictionary Calling First, Last, Single, Count, Max, Min, Average For example when we loop through the course or turn the result to a list: foreach ( var c in courses) Console.WriteLine(c.Name); OR context.Courses      .Where(c => c.Level == 1)      .OrderBy(c => c.Name).ToList();

Solid Principles for Software Design - Part 2

Interface Segregation Principle The Interface Segregation Principle (ISP) is one of the five SOLID principles of object-oriented design, which recommends that "Clients should not be forced to depend on interfaces they do not use". This means we should avoid implementing an interface that has unnecessary methods and therefore not going to be implemented.  Some signs of violating ISP are: Having a "fat" interface, which means having a high number of methods in one interface that are not so related to each other or low cohesion. Empty implementation of methods, certain methods of interface are not needed for implementation. Considering the following example, we violate the principle because CannonPrinter is designed only with the functionality to print, leaving the scan and fax method unimplemented. interface IMultiFunction {      void print(); void scan(); void fax(); } public class HPPrinterNScanner implements ImultiFunction { @Override public void pr...