WAEC Candidate Registration Trends and Regional Disparities in Nigeria (2016-2025)

Author

Christopher Oluwadamilare Ajayi

Published

May 16, 2026

1 Executive Summary

This study examines WAEC candidate registration patterns across Nigerian states from 2016 to 2025 using real operational data. The core business problem is the persistent inequality in access to secondary school certification examinations, with heavy concentration of registrations in a few southern states while many northern states continue to lag significantly.

The primary dataset (examdata.csv) contains over 700 observations with key variables: State, ExamYear, Sex, Region, and CandiNo (number of candidates).

Through Exploratory Data Analysis, Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression, clear patterns emerged: - Lagos, Ogun, Rivers, and Delta consistently lead in registrations. - Southern regions show significantly higher candidate numbers than Northern regions (large effect size). - National registration dipped around 2020 (likely due to COVID-19) but recovered strongly afterward. - Linear regression confirmed that both Year and Region are strong predictors of registration volume.


2 2. Professional Disclosure

I am an Education Data Analyst working in a tertiary institution in Lagos, Nigeria, supporting examination coordination, student enrolment planning, and policy analysis for secondary-to-tertiary transition.

Relevance of Techniques: - Exploratory Data Analysis (EDA): Enables quick identification of enrolment trends, outliers, and data quality issues. - Data Visualisation: Essential for communicating complex patterns to non-technical stakeholders. - Hypothesis Testing: Provides statistical validation of observed regional and gender differences. - Correlation Analysis: Helps assess stability of state-level performance over time. - Linear Regression: Supports forecasting and evaluating impact of geographic and temporal factors.


3 3. Data Collection & Sampling

The dataset was extracted from official WAEC registration records covering 2016–2025. It includes state-level and sex-disaggregated candidate counts.

#| label: load-data #| code-fold: false

library(tidyverse) library(corrplot) library(effsize) library(broom) library(ggplot2) library(patchwork)

4 Load data (place examdata.csv in the same folder as this .qmd file)

exam_data <- read_csv(“examdata.csv”)

exam_data <- exam_data %>% rename(Candidates = CandiNo) %>% mutate(Year = as.numeric(ExamYear))

glimpse(exam_data)

  1. Data Description language-{r}#| label: data-summary

summary(exam_data) colSums(is.na(exam_data)) table(exam_data\(Sex) table(exam_data\)Region)

  1. Technique 1: Exploratory Data Analysis (EDA) language-{r}#| label: eda

5 Basic cleaning

exam_data_clean <- exam_data %>% filter(!is.na(State), State != ““, Region !=”Unknown”)

6 Summary statistics by Region

exam_data_clean %>% group_by(Region) %>% summarise(Total_Candidates = sum(Candidates, na.rm = TRUE), Mean = mean(Candidates, na.rm = TRUE), Median = median(Candidates, na.rm = TRUE), .groups = ‘drop’)

  1. Technique 2: Data Visualisation language-{r}#| label: visualisation #| fig-width: 10 #| fig-height: 6

p1 <- exam_data_clean %>% group_by(Year) %>% summarise(Total = sum(Candidates, na.rm = TRUE)) %>% ggplot(aes(x = Year, y = Total)) + geom_line(color = “blue”, size = 1.2) + geom_point(size = 3) + labs(title = “Total WAEC Candidates by Year”, y = “Number of Candidates”) + theme_minimal()

p2 <- exam_data_clean %>% group_by(Region) %>% summarise(Total = sum(Candidates, na.rm = TRUE)) %>% ggplot(aes(x = reorder(Region, Total), y = Total, fill = Region)) + geom_col() + coord_flip() + labs(title = “Candidates by Region”) + theme_minimal()

p1 / p2

  1. Technique 3: Hypothesis Testing language-{r}#| label: hypothesis-testing

7 H1: Male vs Female

gender_test <- t.test(Candidates ~ Sex, data = exam_data_clean) print(gender_test) cohen.d(Candidates ~ Sex, data = exam_data_clean)

8 H2: North vs South

region_test <- exam_data_clean %>% mutate(North_South = ifelse(str_detect(Region, “North”), “North”, “South”)) %>% t.test(Candidates ~ North_South, data = .)

print(region_test)

  1. Technique 4: Correlation Analysis language-{r}#| label: correlation

yearly <- exam_data_clean %>% group_by(Year) %>% summarise(Total = sum(Candidates, na.rm = TRUE))

cor(yearly\(Year, yearly\)Total)

9 Regional correlation matrix

region_wide <- exam_data_clean %>% group_by(Region, Year) %>% summarise(Candidates = sum(Candidates, na.rm = TRUE), .groups = ‘drop’) %>% pivot_wider(names_from = Year, values_from = Candidates, values_fill = 0)

cor_matrix <- cor(region_wide[,-1], use = “complete.obs”) corrplot(cor_matrix, method = “color”, type = “upper”, tl.cex = 0.8)

  1. Technique 5: Linear Regression language-{r}#| label: regression

model_data <- exam_data_clean %>% filter(Candidates > 0)

model <- lm(log(Candidates + 1) ~ Year + Region + Sex, data = model_data)

summary(model)

10 Diagnostics

par(mfrow = c(2,2)) plot(model) par(mfrow = c(1,1))

11 Significant coefficients

tidy(model, conf.int = TRUE) %>% filter(p.value < 0.05)

  1. Integrated Findings All five techniques consistently show strong regional disparities in WAEC registration. Southern states dominate, with statistically significant differences confirmed by hypothesis testing and regression. The patterns are stable over time (high year-to-year correlation).

Recommendation: State ministries of education should launch a National Examination Equity Programme focusing on awareness campaigns, fee subsidies, and strategic expansion of examination centres in under-represented northern and rural areas. Recommendation: Implement a targeted National Examination Equity Programme with focus on Northern states.

  1. Limitations & Further Work

Data is aggregated at state level; individual-level factors (income, school quality, etc.) are missing. Future work: Include school type, urban/rural classification, and fee subsidy data.

References Adi, B. (2026). AI-powered business analytics. Lagos Business School.

Appendix: AI Usage Statement I used Grok (xAI) to help structure the Quarto document and fix rendering issues. All data analysis, statistical decisions, interpretations, and business recommendations are my own. text—