When you design a relational database without thinking carefully about structure, you end up with data that’s redundant, hard to update, and prone to inconsistencies — a set of problems collectively called anomalies. Normalization is the systematic process of organizing tables to eliminate these issues. In this guide, we’ll walk through each normal form from 1NF to 4NF, using a single evolving example so you can see exactly what changes at each step and why.

What Is Normalization?

Normalization is the process of breaking down a table into smaller, well-structured tables while preserving all the original data. The goal is to:

  • Eliminate redundant data (the same fact stored in multiple places)
  • Prevent update anomalies (changing one row requires changing many others)
  • Prevent insert anomalies (you can’t add data without adding unrelated data)
  • Prevent delete anomalies (deleting one fact accidentally deletes another)

Each “normal form” is a rule (or set of rules) that a table must satisfy. They build on each other — a table in 3NF is also in 2NF and 1NF.

Key concepts before we start

  • Functional dependency: Column B is functionally dependent on column A if knowing A’s value uniquely determines B’s value. Written as A → B.
  • Candidate key: A minimal set of columns that uniquely identifies every row.
  • Primary key: The candidate key you choose to identify rows.
  • Partial dependency: A non-key column depends on only part of a composite primary key.
  • Transitive dependency: A non-key column depends on another non-key column (rather than directly on the primary key).

Unnormalized Form (UNF)

Let’s start with a raw, denormalized table that tracks student course enrollments.

CREATE TABLE Enrollment (
    StudentID    INT,
    StudentName  VARCHAR(100),
    Courses      VARCHAR(255),   -- comma-separated: "Math, Physics"
    Instructor   VARCHAR(100),
    InstructorOffice VARCHAR(50),
    Grade        VARCHAR(5)
);

A sample row might look like:

StudentID StudentName Courses Instructor InstructorOffice Grade
101 Alice Math, Physics Dr. Smith Room 12 A, B

This table has multiple values stuffed into single cells. That alone creates problems — you can’t index or query individual courses without string parsing. Let’s fix it, normal form by normal form.


First Normal Form (1NF)

Rule: Every cell must contain a single, atomic value. No repeating groups, no comma-separated lists.

To get to 1NF, we split the multi-valued Courses and Grade columns so each row represents one student–course pair.

CREATE TABLE Enrollment_1NF (
    StudentID        INT,
    StudentName      VARCHAR(100),
    CourseID         INT,
    CourseName       VARCHAR(100),
    Instructor       VARCHAR(100),
    InstructorOffice VARCHAR(50),
    Grade            VARCHAR(5),
    PRIMARY KEY (StudentID, CourseID)
);
StudentID StudentName CourseID CourseName Instructor InstructorOffice Grade
101 Alice C01 Math Dr. Smith Room 12 A
101 Alice C02 Physics Dr. Jones Room 7 B
102 Bob C01 Math Dr. Smith Room 12 C

Every cell is atomic. Each row has a composite primary key (StudentID, CourseID). This satisfies 1NF — but we still have redundancy. StudentName repeats for every course Alice takes, and InstructorOffice repeats for every student Dr. Smith teaches.


Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key — no partial dependencies.

In our 1NF table, the primary key is (StudentID, CourseID). Look at the dependencies:

  • StudentName depends only on StudentIDpartial dependency
  • Instructor, InstructorOffice, CourseName depend only on CourseIDpartial dependency
  • Grade depends on both StudentID and CourseID — full dependency ✓

To reach 2NF, we split out the partially dependent columns into their own tables.

-- Student info depends only on StudentID
CREATE TABLE Student (
    StudentID   INT PRIMARY KEY,
    StudentName VARCHAR(100)
);

-- Course info depends only on CourseID
CREATE TABLE Course (
    CourseID         INT PRIMARY KEY,
    CourseName       VARCHAR(100),
    Instructor       VARCHAR(100),
    InstructorOffice VARCHAR(50)
);

-- Grade depends on the full (StudentID, CourseID) pair
CREATE TABLE Enrollment_2NF (
    StudentID INT,
    CourseID  INT,
    Grade     VARCHAR(5),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID)  REFERENCES Course(CourseID)
);

Now updating Alice’s name requires changing exactly one row in Student, not every enrollment row. We’ve eliminated partial dependencies.


Third Normal Form (3NF)

Rule: Must be in 2NF, and every non-key column must depend directly on the primary key — no transitive dependencies.

Look at the Course table:

  • CourseID → Instructor (direct dependency ✓)
  • CourseID → InstructorOffice… but actually Instructor → InstructorOffice

InstructorOffice depends on Instructor, not on CourseID directly. That’s a transitive dependency. If Dr. Smith moves offices, we’d have to update every course row he teaches.

To reach 3NF, we extract the transitive dependency:

-- Instructor office depends on the instructor, not the course
CREATE TABLE Instructor (
    InstructorName   VARCHAR(100) PRIMARY KEY,
    InstructorOffice VARCHAR(50)
);

-- Course now only stores what directly depends on CourseID
CREATE TABLE Course_3NF (
    CourseID       INT PRIMARY KEY,
    CourseName     VARCHAR(100),
    InstructorName VARCHAR(100),
    FOREIGN KEY (InstructorName) REFERENCES Instructor(InstructorName)
);

Now a change to Dr. Smith’s office number requires updating a single row in Instructor. No more transitive dependencies — the table is in 3NF.


Boyce-Codd Normal Form (BCNF)

Rule: Must be in 3NF, and for every functional dependency A → B, A must be a superkey (i.e., A uniquely identifies the entire row).

3NF allows a few edge cases where a non-key attribute determines part of a candidate key. BCNF closes that loophole. Consider this schedule table:

-- A professor teaches exactly one subject; a subject can have multiple professors
-- A student can enroll in a subject only once
CREATE TABLE Schedule (
    StudentID   INT,
    Subject     VARCHAR(100),
    Professor   VARCHAR(100),
    PRIMARY KEY (StudentID, Subject)
);

The dependencies here are:

  • (StudentID, Subject) → Professor
  • Professor → Subject — a professor teaches exactly one subject

The problem: Professor is not a superkey, yet Professor → Subject holds. This is a BCNF violation.

Split the table:

CREATE TABLE ProfessorSubject (
    Professor VARCHAR(100) PRIMARY KEY,
    Subject   VARCHAR(100)
);

CREATE TABLE StudentProfessor (
    StudentID INT,
    Professor VARCHAR(100),
    PRIMARY KEY (StudentID, Professor),
    FOREIGN KEY (Professor) REFERENCES ProfessorSubject(Professor)
);

Every determinant is now a superkey. BCNF is stricter than 3NF — most practical schemas stop here.


Fourth Normal Form (4NF)

Rule: Must be in BCNF, and a table must not have more than one multi-valued dependency.

A multi-valued dependency exists when one attribute independently determines a set of values for two other attributes. For example, consider a table tracking which languages and hobbies an employee has:

CREATE TABLE EmployeeProfile (
    EmployeeID INT,
    Language   VARCHAR(50),
    Hobby      VARCHAR(50),
    PRIMARY KEY (EmployeeID, Language, Hobby)
);
EmployeeID Language Hobby
1 English Chess
1 English Cycling
1 Spanish Chess
1 Spanish Cycling

EmployeeID →→ Language and EmployeeID →→ Hobby are two independent multi-valued dependencies. Adding a new hobby means inserting a row for every language, and vice versa. That’s a 4NF violation.

The fix is to split them into separate tables:

CREATE TABLE EmployeeLanguage (
    EmployeeID INT,
    Language   VARCHAR(50),
    PRIMARY KEY (EmployeeID, Language)
);

CREATE TABLE EmployeeHobby (
    EmployeeID INT,
    Hobby      VARCHAR(50),
    PRIMARY KEY (EmployeeID, Hobby)
);

Now adding “Cycling” for Employee 1 is a single insert into EmployeeHobby, regardless of how many languages they speak.


Quick Reference

Normal Form What it eliminates Prerequisite
1NF Non-atomic values, repeating groups None
2NF Partial dependencies on composite key 1NF
3NF Transitive dependencies (non-key → non-key) 2NF
BCNF Non-superkey determinants 3NF
4NF Multiple independent multi-valued dependencies BCNF

When to Denormalize

Normalization is the right default for write-heavy OLTP systems — it keeps your data consistent and updates cheap. But in read-heavy analytical workloads (data warehouses, reporting systems), joins across many normalized tables can hurt query performance significantly.

In those cases, it’s common to intentionally denormalize — storing redundant data to avoid joins — or use a star schema where a central fact table references pre-joined dimension tables. The rule of thumb: normalize first, denormalize only when you have a measured performance problem.

Conclusion

Normal forms give you a concrete, step-by-step checklist for building clean relational schemas. Start with 1NF (atomic values), move to 2NF (eliminate partial dependencies), then 3NF (eliminate transitive dependencies), and apply BCNF and 4NF when your schema has overlapping candidate keys or multi-valued facts. A schema in 3NF or BCNF covers the vast majority of real-world use cases and will save you from a whole category of data integrity bugs before they ever reach production.