📊 Module 7: Aggregation & GROUP BY

SUM, AVG, COUNT, ROLLUP, CUBE, GROUPING Functions

GROUP BY
Grouping
HAVING
Group Filter
ROLLUP
Subtotals
CUBE
Cross Tabs

🔢 Aggregate Functions

Common Aggregate Functions

Function Purpose Example Returns
COUNT(*) Total rows COUNT(*) AS total Number
COUNT(col) Non-NULL values COUNT(salary) Number
SUM(col) Sum of values SUM(salary) Number
AVG(col) Average value AVG(salary) Number
MIN(col) Minimum value MIN(salary) Any type
MAX(col) Maximum value MAX(salary) Any type
STDDEV(col) Standard deviation STDDEV(salary) Number
VARIANCE(col) Variance VARIANCE(salary) Number

Basic Aggregate Queries

-- Simple aggregation
SELECT 
  COUNT(*) AS total_employees,
  SUM(salary) AS total_payroll,
  AVG(salary) AS avg_salary,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary
FROM employees;

-- Aggregation with DISTINCT
SELECT 
  COUNT(DISTINCT department_id) AS dept_count,
  COUNT(DISTINCT manager_id) AS manager_count
FROM employees;

-- Conditional aggregation
SELECT 
  COUNT(CASE WHEN salary > 50000 THEN 1 END) AS high_earners,
  COUNT(CASE WHEN salary <= 50000 THEN 1 END) AS low_earners
FROM employees;

🎯 GROUP BY Clause

Basic GROUP BY

-- Group by single column
SELECT 
  department_id,
  COUNT(*) AS emp_count,
  AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC;

-- Group by multiple columns
SELECT 
  department_id,
  job_id,
  COUNT(*) AS emp_count,
  SUM(salary) AS total_salary
FROM employees
GROUP BY department_id, job_id
ORDER BY department_id, job_id;

-- Group with JOIN
SELECT 
  d.department_name,
  COUNT(e.employee_id) AS emp_count,
  AVG(e.salary) AS avg_salary
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY emp_count DESC;

🔍 HAVING Clause - Filter Groups

Filter groups after aggregation (similar to WHERE but for groups).

-- HAVING filters groups, WHERE filters rows
SELECT 
  department_id,
  COUNT(*) AS emp_count,
  AVG(salary) AS avg_salary
FROM employees
WHERE salary > 40000  -- Row filter (before grouping)
GROUP BY department_id
HAVING COUNT(*) > 5  -- Group filter (after grouping)
ORDER BY emp_count DESC;

-- Multiple HAVING conditions
SELECT 
  department_id,
  COUNT(*) AS emp_count,
  AVG(salary) AS avg_salary,
  MAX(salary) AS max_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 3 
  AND AVG(salary) > 60000
  AND MAX(salary) > 100000;

-- HAVING with subquery
SELECT 
  department_id,
  COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > (
  SELECT AVG(emp_per_dept) 
  FROM (
    SELECT COUNT(*) emp_per_dept 
    FROM employees 
    GROUP BY department_id
  )
);

📈 ROLLUP - Hierarchical Aggregation

Creates subtotals and grand total from left to right.

-- ROLLUP generates subtotals and grand total
SELECT 
  department_id,
  job_id,
  COUNT(*) AS emp_count,
  SUM(salary) AS total_salary
FROM employees
GROUP BY ROLLUP (department_id, job_id)
ORDER BY department_id, job_id;

-- Results include:
   - Detail rows (all combinations)
   - Subtotals by department (job_id = NULL)
   - Grand total (both NULL)

-- Identify subtotal rows using GROUPING
SELECT 
  CASE 
    WHEN GROUPING(department_id) = 1 THEN 'GRAND TOTAL'
    WHEN GROUPING(job_id) = 1 THEN 'SUBTOTAL (Dept: ' || department_id || ')'
    ELSE 'DETAIL'
  END AS row_type,
  department_id,
  job_id,
  COUNT(*) AS emp_count,
  SUM(salary) AS total_salary
FROM employees
GROUP BY ROLLUP (department_id, job_id)
ORDER BY department_id, job_id;

🎲 CUBE - Cross-Tabulation

Creates all possible combinations of groupings (full cross-tabulation).

-- CUBE generates all combinations
SELECT 
  department_id,
  job_id,
  COUNT(*) AS emp_count,
  SUM(salary) AS total_salary
FROM employees
GROUP BY CUBE (department_id, job_id)
ORDER BY department_id, job_id;

-- Results include:
   - (dept, job) combinations
   - (dept only) subtotals
   - (job only) subtotals
   - Grand total

-- Using GROUPING_ID for compact notation
SELECT 
  GROUPING_ID(department_id, job_id) AS grouping_level,
  department_id,
  job_id,
  COUNT(*) AS emp_count
FROM employees
GROUP BY CUBE (department_id, job_id);

-- GROUPING_ID returns:
   0 = Both specified
   1 = department_id rolled up
   2 = job_id rolled up  
   3 = Both rolled up (grand total)

⚡ ROLLUP vs CUBE Comparison

Aspect ROLLUP CUBE
Combinations Hierarchical (left to right) All possible
Example: (A,B) (A,B) → (A) → () (A,B) → (A) → (B) → ()
Row Count Fewer rows More rows (2^n)
Use Case Hierarchical data (year → month → day) Cross-tabulation (dept ↔ job)
Performance ⭐⭐⭐ Better ⭐⭐ Slower (more rows)

Detailed Example: ROLLUP

-- Data: 4 employees in 2 departments with 2 jobs
SELECT 
  department_id,
  job_id,
  SUM(salary)
FROM employees
GROUP BY ROLLUP (department_id, job_id);

-- Results:
-- 10    ANALYST    50000  -- Detail
-- 10    NULL       100000 -- Subtotal for dept 10
-- 20    MANAGER    80000  -- Detail
-- 20    NULL       150000 -- Subtotal for dept 20
-- NULL  NULL       250000 -- Grand Total
-- Total: 5 rows (from original 4)

Detailed Example: CUBE

-- Same data with CUBE
SELECT 
  department_id,
  job_id,
  SUM(salary)
FROM employees
GROUP BY CUBE (department_id, job_id);

-- Results:
-- 10    ANALYST    50000  -- (dept, job)
-- 10    MANAGER    50000  -- (dept, job)
-- 20    ANALYST    60000  -- (dept, job)
-- 20    MANAGER    80000  -- (dept, job)
-- 10    NULL       100000 -- Subtotal by dept
-- 20    NULL       140000 -- Subtotal by dept
-- NULL  ANALYST    110000 -- Subtotal by job
-- NULL  MANAGER    130000 -- Subtotal by job
-- NULL  NULL       240000 -- Grand Total
-- Total: 9 rows (from original 4, = 2^2 + 4)

🚀 Advanced Aggregation Examples

Running Totals and Cumulative Sums

-- SUM with window function for running total
SELECT 
  employee_id,
  salary,
  SUM(salary) OVER (ORDER BY employee_id) AS running_total
FROM employees
ORDER BY employee_id;

-- Running total by department
SELECT 
  department_id,
  employee_id,
  salary,
  SUM(salary) OVER (
    PARTITION BY department_id 
    ORDER BY employee_id
  ) AS dept_running_total
FROM employees;
💡 Tip: Use ROLLUP for hierarchical data (time periods). Use CUBE for cross-tab analysis. Always use GROUPING() to identify subtotal rows.
✓ Learning Checklist - Module 7