🎨 Module 2: Data Modeling & Design Principles

ERD, Normalization (1NF-5NF), Design Patterns, Denormalization Strategies

5
Normalization Forms
ER
Diagram Model
8
Design Patterns
100%
Data Integrity

📊 Entity-Relationship Diagrams (ERD)

ERD Fundamentals

An Entity-Relationship Diagram visually represents database structure and relationships between entities:

Notation Meaning Example
Rectangle Entity (table) EMPLOYEES, DEPARTMENTS
Oval Attribute (column) employee_id, first_name
Double Oval Multi-valued attribute phone_numbers
Line Relationship Works_In, Manages
PK (Primary Key) Unique identifier employee_id (underlined)
FK (Foreign Key) Reference to other entity dept_id references DEPARTMENTS

Relationship Cardinalities

Cardinality Notation Meaning Example
One-to-One 1:1 One entity relates to exactly one other Employee : Parking_Space
One-to-Many 1:N One relates to multiple others Department : Employees (1:N)
Many-to-Many M:N Multiple relate to multiple Students : Courses (M:N)
Self-Referencing Reflexive Entity relates to itself Employee : Manager (Employee)

🔧 Normalization Forms (1NF to 5NF)

1NF - First Normal Form (Eliminate Repeating Groups)

Rule: Remove duplicate columns, store one value per column

-- NOT 1NF: Multiple phone numbers in one column
StudentPhones:
  StudentID | StudentName | Phones
  1         | John        | 555-1111, 555-2222, 555-3333

-- AFTER 1NF: Repeating groups eliminated
StudentPhones:
  StudentID | StudentName | PhoneNumber
  1         | John        | 555-1111
  1         | John        | 555-2222
  1         | John        | 555-3333

2NF - Second Normal Form (Remove Partial Dependencies)

Rule: Remove attributes that depend on part of a composite key

-- NOT 2NF: InstructorName depends only on InstructorID, not on composite key
CourseInstructor:
  CourseID | InstructorID | InstructorName
  C101     | I001        | Dr. Smith
  C101     | I002        | Dr. Brown

-- AFTER 2NF: Split into separate tables
CourseInstructor:
  CourseID | InstructorID
  C101     | I001
  C101     | I002

Instructor:
  InstructorID | InstructorName
  I001        | Dr. Smith
  I002        | Dr. Brown

3NF - Third Normal Form (Remove Transitive Dependencies)

Rule: Remove attributes that depend on non-key attributes

-- NOT 3NF: Department_Manager depends on DepartmentID, not PK
Employee:
  EmpID | Name    | DepartmentID | DepartmentName | DepartmentManager
  E001  | John    | D01         | Sales         | Alice
  E002  | Sarah   | D02         | HR            | Bob

-- AFTER 3NF: Remove transitive dependency
Employee:
  EmpID | Name   | DepartmentID
  E001  | John   | D01
  E002  | Sarah  | D02

Department:
  DepartmentID | DepartmentName | DepartmentManager
  D01         | Sales         | Alice
  D02         | HR            | Bob

BCNF - Boyce-Codd Normal Form

Rule: Every determinant must be a candidate key (stricter than 3NF)

4NF & 5NF (Advanced Forms)

💡 Practical Insight: Most business databases should reach 3NF. BCNF is good for complex scenarios. 4NF & 5NF are rarely needed in practice.

🎯 Database Design Patterns

1. Star Schema (Dimensional Modeling)

Used in data warehouses for analytical queries

-- Fact table (central)
SalesFact:
  SalesID, DateID, ProductID, CustomerID, Amount, Quantity

-- Dimension tables (arms of star)
DateDimension: DateID, Date, Quarter, Year
ProductDimension: ProductID, ProductName, Category, Price
CustomerDimension: CustomerID, CustomerName, Region, Segment

2. Snowflake Schema

Normalized version of star schema, dimensions are normalized

3. Surrogate Key Pattern

Use system-generated keys instead of natural keys

-- Natural Key (business data)
Employee (SSN, FirstName, LastName)

-- Surrogate Key (system generated)
Employee (EmployeeID, SSN, FirstName, LastName)
  - EmployeeID: Auto-increment primary key
  - More flexible for schema changes

4. Soft Delete Pattern

Mark records as deleted instead of physically removing them

-- Add is_deleted flag
CREATE TABLE employees (
  employee_id NUMBER PRIMARY KEY,
  first_name VARCHAR2(50),
  is_deleted CHAR(1) DEFAULT 'N'
);

-- Soft delete (reversible)
UPDATE employees SET is_deleted = 'Y' WHERE employee_id = 123;

-- Always filter active records
SELECT * FROM employees WHERE is_deleted = 'N';

5. Audit Trail Pattern

Track all changes for compliance and auditing

-- Main table
employees (employee_id, name, salary, version)

-- Audit table (automatic updates tracked)
employee_audit (
  audit_id, employee_id, old_salary, new_salary, 
  changed_by, changed_date, change_reason
)

-- Trigger example:
TRIGGER emp_audit AFTER UPDATE ON employees
FOR EACH ROW BEGIN
  INSERT INTO employee_audit 
  VALUES (audit_seq.NEXTVAL, :OLD.id, :OLD.salary, :NEW.salary, 
          USER, SYSDATE, 'Salary Update');
END;

6. Temporal Data Pattern (SCD - Slowly Changing Dimensions)

Type Description Use Case
SCD Type 1 Overwrite old data Current data only (no history)
SCD Type 2 Add new row with timestamps Full history tracking
SCD Type 3 Add previous value columns Current + previous only

⚖️ Denormalization Strategies

When to Denormalize

Denormalization Techniques

Technique Benefit Cost
Redundant Columns Eliminate JOIN, faster queries Storage, update complexity
Derived Columns Pre-calculated values Update triggers, consistency
Materialized Views Pre-computed aggregations Storage, refresh overhead
Horizontal Partitioning Parallel query execution Distributed complexity
-- Denormalization Example: Store total salary per department
CREATE TABLE departments (
  dept_id NUMBER PRIMARY KEY,
  dept_name VARCHAR2(50),
  employee_count NUMBER,        -- Redundant
  total_salary NUMBER         -- Derived
);

-- Maintain with trigger on salary changes
TRIGGER update_dept_totals AFTER UPDATE OF salary ON employees
FOR EACH ROW BEGIN
  UPDATE departments
  SET total_salary = total_salary + (:NEW.salary - :OLD.salary)
  WHERE dept_id = :NEW.dept_id;
END;
⚠️ Warning: Denormalization adds complexity. Keep update logic simple and maintain referential integrity with triggers.

🏆 Design Best Practices

✓ Learning Checklist - Module 2