Bachelor of Science in Computer Science
Course ContentNormalization
Habari Mwanafunzi! Welcome to the World of Database Normalization!
Ever tried to find a specific book in a library where all the books are just thrown in one big pile? Or looked for your favourite spice in a kitchen cabinet that's completely disorganized? It's a nightmare, right? You waste so much time and energy. That messy pile is exactly what a database looks like without Normalization.
Think of normalization as the super-librarian for your data. It’s a process of organizing the data in your database to make it clean, efficient, and reliable. We do this to reduce repetition (what we call redundancy) and to make sure our data is logical and trustworthy (what we call data integrity).
Imagine you have a duka. In a messy duka, the sugar might be next to the soap, and the bread might be under the batteries. To find anything, you have to search everywhere. But in a well-organized duka (or a supermarket like Naivas), all the soaps are in one aisle, all the sugars in another. This is normalization! You know exactly where to go, and you avoid buying the same thing twice because you couldn't find it the first time.
Image Suggestion:[An AI-generated image in a vibrant, realistic style. On the left side, a cluttered, disorganized Kenyan duka with items haphazardly placed on shelves. On the right side, a clean, modern supermarket aisle with products neatly arranged and labeled. A clear dividing line is in the middle. The text "Before Normalization" is on the left and "After Normalization" is on the right.]
Why Should We Bother Normalizing? The "Three Big Problems" We Solve
When our data is messy, we run into some serious issues called Data Anomalies. These are like annoying little gremlins that mess up our information.
- Insertion Anomaly: You can't add new information because some other information is missing. For example, you can't add a new university course to your list until a student enrolls in it. That doesn't make sense!
- Deletion Anomaly: You delete some information, and you accidentally delete other, unrelated information. For example, if you delete the only student from a certain county, you might lose the record of that county's name forever!
- Update Anomaly: To change one piece of information, you have to find and change it in many different places. If a lecturer, Mr. Omondi, changes his phone number, and his number is listed next to every single student he teaches, you have to update it 100 times! If you miss one, your data becomes inconsistent.
Normalization helps us wipe out these anomalies and keep our data clean and reliable.
The Building Blocks: Keys and Dependencies
Before we start organizing (normalizing), we need to understand two simple but powerful concepts.
1. Primary Key (PK): This is a unique identifier for each record in a table. No two records can have the same primary key. Think of your National ID Number or your Student Admission Number. It's unique to you!
2. Functional Dependency: This sounds complicated, but it's easy. It just means that one piece of data determines another. We write it as A -> B, which means "A determines B".
// Functional Dependency Example
StudentID -> StudentName
// This means if I know the StudentID, I can find out the StudentName.
// The StudentName DEPENDS on the StudentID.
// BUT, the reverse is not true!
// StudentName does NOT determine StudentID (StudentName -> StudentID is false)
// Why? Because there can be two students named 'John Onyango'.
Let's Get Organizing: The Normal Forms!
Normalization is a step-by-step process. We move our database through different "Normal Forms" (NF). For your level, we'll master the first three, which are the most important!
First Normal Form (1NF): The "One Value Per Box" Rule
This is the most basic rule. To be in 1NF, your table must satisfy two conditions:
- Each column must have atomic values, meaning each cell should hold only a single value. No lists or multiple items in one box!
- The table must have a primary key.
Scenario: A lecturer at Kenyatta University keeps a table of students and their phone numbers. Some students have more than one number.
BAD Table (Not in 1NF):+-----------+--------------+-----------------------+ | StudentID | StudentName | PhoneNumber | +-----------+--------------+-----------------------+ | K01-1111 | Jane Wambui | 0722123456, 0733654321 | | K01-1112 | Peter Ochieng| 0711987654 | +-----------+--------------+-----------------------+See the problem? Jane's record has two phone numbers in one cell. This is a "repeating group" and is not atomic.
GOOD Table (Now in 1NF):To fix this, we create a separate row for each phone number.
+-----------+--------------+--------------+ | StudentID | StudentName | PhoneNumber | +-----------+--------------+--------------+ | K01-1111 | Jane Wambui | 0722123456 | | K01-1111 | Jane Wambui | 0733654321 | | K01-1112 | Peter Ochieng| 0711987654 | +-----------+--------------+--------------+Hooray! Now every cell has a single value. But wait... notice how "Jane Wambui" is repeated? This is data redundancy, which the next normal form will help us solve!
Second Normal Form (2NF): No "Partial" Dependencies!
For a table to be in 2NF, it must first be in 1NF. The second rule is:
- All non-key attributes must be fully functionally dependent on the entire primary key.
This rule really only matters when you have a composite primary key (a primary key made up of two or more columns).
Scenario: Let's track student enrollments in different courses. The combination of(StudentID, CourseID)will be our composite primary key, because a student can enroll in many courses, and a course can have many students.
BAD Table (In 1NF, but not 2NF):Our Primary Key is (StudentID, CourseID)
Let's check the dependencies:+-----------+----------+---------------+-------------------+ | StudentID | CourseID | StudentName | CourseName | +-----------+----------+---------------+-------------------+ | S-101 | C-CS101 | David Mwaura | Intro to Java | | S-101 | C-DB202 | David Mwaura | Database Systems | | S-102 | C-CS101 | Fatuma Ali | Intro to Java | +-----------+----------+---------------+-------------------+{StudentID, CourseID} -> StudentName (Correct, but can be better) {StudentID, CourseID} -> CourseName (Correct, but can be better) BUT... StudentID -> StudentName (StudentName only depends on PART of the key!) CourseID -> CourseName (CourseName only depends on PART of the key!)This is a partial dependency.
GOOD Tables (Now in 2NF):StudentNameonly needsStudentIDto be known, not the whole key. This causes update anomalies. If David Mwaura legally changes his name, we have to update it in two places!We solve this by splitting the table into smaller, more logical tables.
Table 1: StudentsTable 2: Courses+-----------+---------------+ | StudentID | StudentName | <-- PK: StudentID +-----------+---------------+ | S-101 | David Mwaura | | S-102 | Fatuma Ali | +-----------+---------------+Table 3: Enrollments+----------+-------------------+ | CourseID | CourseName | <-- PK: CourseID +----------+-------------------+ | C-CS101 | Intro to Java | | C-DB202 | Database Systems | +----------+-------------------++-----------+----------+ | StudentID | CourseID | <-- Composite PK: (StudentID, CourseID) +-----------+----------+ | S-101 | C-CS101 | | S-101 | C-DB202 | | S-102 | C-CS101 | +-----------+----------+Look at that! No more repetition. To update David's name, you only do it once in the
Studentstable. Beautiful!
Image Suggestion:[A simple, clear diagram. On the left, show one large table labeled "Enrollments (Not 2NF)". Arrows point from parts of the composite key (StudentID, CourseID) to other columns, illustrating partial dependencies. On the right, show the three smaller, separate tables: "Students", "Courses", and "Enrollments". Use arrows to show the Primary Key-Foreign Key relationships connecting them. The style should be like a clean textbook diagram.]
Third Normal Form (3NF): No "Sneaky" Dependencies!
You're doing great! This is the last step for today. For a table to be in 3NF, it must first be in 2NF. The third rule is:
- There should be no transitive dependencies.
A transitive dependency is when a non-key attribute depends on another non-key attribute, instead of depending on the primary key. It's like a chain reaction: Primary Key -> Non-Key A -> Non-Key B.
Scenario: We have a table of students and the county they come from.
BAD Table (In 2NF, but not 3NF):The Primary Key is
StudentID.Let's check the dependencies:+-----------+--------------+----------+--------------+ | StudentID | StudentName | CountyID | CountyName | +-----------+--------------+----------+--------------+ | 2023-01 | Sam Kiptoo | 47 | Nairobi | | 2023-02 | Mary Atieno | 01 | Mombasa | | 2023-03 | Ken Lemayian | 47 | Nairobi | +-----------+--------------+----------+--------------+StudentID -> StudentName (Good) StudentID -> CountyID (Good) BUT... there's a sneaky one! CountyID -> CountyName (This is a problem!)Here,
GOOD Tables (Now in 3NF):CountyNamedepends onCountyID, which is not the primary key. This meansCountyNamehas a transitive dependency onStudentID. If the name of a county changes (unlikely, but possible!), or if there's a typo like "Nairboi", you'd have to fix it in every student record from that county.Again, we split the table!
Table 1: StudentsTable 2: Counties+-----------+--------------+----------+ | StudentID | StudentName | CountyID | <-- PK: StudentID, FK: CountyID +-----------+--------------+----------+ | 2023-01 | Sam Kiptoo | 47 | | 2023-02 | Mary Atieno | 01 | | 2023-03 | Ken Lemayian | 47 | +-----------+--------------+----------++----------+--------------+ | CountyID | CountyName | <-- PK: CountyID +----------+--------------+ | 47 | Nairobi | | 01 | Mombasa | +----------+--------------+Perfect! We have removed the transitive dependency. The data is now much cleaner, more efficient, and easier to manage. You are now a normalization champion!
Summary: The Path to 3NF
Remember this simple mantra:
- 1NF: The key. (Every cell is atomic).
- 2NF: The whole key. (No partial dependencies).
- 3NF: And nothing but the key. (No transitive dependencies).
+-----------------+
| Unnormalized |
+--------+--------+
|
v
+-----------------+
| First NF (1NF) | Eliminate Repeating Groups
+--------+--------+
|
v
+-----------------+
| Second NF (2NF) | Eliminate Partial Dependencies
+--------+--------+
|
v
+-----------------+
| Third NF (3NF) | Eliminate Transitive Dependencies
+-----------------+
Well done for making it through! Normalization might seem like a lot of rules, but it's the foundation of good, professional database design. Practice with your own examples – think about a boda-boda sacco, a local clinic, or even your family's contacts. The more you practice, the easier it becomes. Keep up the great work!
Pro Tip
Take your own short notes while going through the topics.