WAEC Candidate Registration Trends and Regional Disparities in Nigeria (2016-2025)
1 Executive Summary
This study examines WAEC candidate registration patterns across Nigerian states from 2016 to 2025 using real operational data. The core business problem is the persistent inequality in access to secondary school certification examinations, with heavy concentration of registrations in a few southern states while many northern states continue to lag significantly.
The primary dataset (examdata.csv) contains over 700 observations with key variables: State, ExamYear, Sex, Region, and CandiNo (number of candidates).
Through Exploratory Data Analysis, Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression, clear patterns emerged: - Lagos, Ogun, Rivers, and Delta consistently lead in registrations. - Southern regions show significantly higher candidate numbers than Northern regions (large effect size). - National registration dipped around 2020 (likely due to COVID-19) but recovered strongly afterward. - Linear regression confirmed that both Year and Region are strong predictors of registration volume.
2 2. Professional Disclosure
I am an Education Data Analyst working in a tertiary institution in Lagos, Nigeria, supporting examination coordination, student enrolment planning, and policy analysis for secondary-to-tertiary transition.
Relevance of Techniques: - Exploratory Data Analysis (EDA): Enables quick identification of enrolment trends, outliers, and data quality issues. - Data Visualisation: Essential for communicating complex patterns to non-technical stakeholders. - Hypothesis Testing: Provides statistical validation of observed regional and gender differences. - Correlation Analysis: Helps assess stability of state-level performance over time. - Linear Regression: Supports forecasting and evaluating impact of geographic and temporal factors.
3 3. Data Collection & Sampling
The dataset was extracted from official WAEC registration records covering 2016–2025. It includes state-level and sex-disaggregated candidate counts.
#| label: load-data #| code-fold: false
library(tidyverse) library(corrplot) library(effsize) library(broom) library(ggplot2) library(patchwork)
4 Load data (place examdata.csv in the same folder as this .qmd file)
exam_data <- read_csv(“examdata.csv”)
exam_data <- exam_data %>% rename(Candidates = CandiNo) %>% mutate(Year = as.numeric(ExamYear))
glimpse(exam_data)
- Data Description language-{r}#| label: data-summary
summary(exam_data) colSums(is.na(exam_data)) table(exam_data\(Sex) table(exam_data\)Region)
- Technique 1: Exploratory Data Analysis (EDA) language-{r}#| label: eda
5 Basic cleaning
exam_data_clean <- exam_data %>% filter(!is.na(State), State != ““, Region !=”Unknown”)
6 Summary statistics by Region
exam_data_clean %>% group_by(Region) %>% summarise(Total_Candidates = sum(Candidates, na.rm = TRUE), Mean = mean(Candidates, na.rm = TRUE), Median = median(Candidates, na.rm = TRUE), .groups = ‘drop’)
- Technique 2: Data Visualisation language-{r}#| label: visualisation #| fig-width: 10 #| fig-height: 6
p1 <- exam_data_clean %>% group_by(Year) %>% summarise(Total = sum(Candidates, na.rm = TRUE)) %>% ggplot(aes(x = Year, y = Total)) + geom_line(color = “blue”, size = 1.2) + geom_point(size = 3) + labs(title = “Total WAEC Candidates by Year”, y = “Number of Candidates”) + theme_minimal()
p2 <- exam_data_clean %>% group_by(Region) %>% summarise(Total = sum(Candidates, na.rm = TRUE)) %>% ggplot(aes(x = reorder(Region, Total), y = Total, fill = Region)) + geom_col() + coord_flip() + labs(title = “Candidates by Region”) + theme_minimal()
p1 / p2
- Technique 3: Hypothesis Testing language-{r}#| label: hypothesis-testing
7 H1: Male vs Female
gender_test <- t.test(Candidates ~ Sex, data = exam_data_clean) print(gender_test) cohen.d(Candidates ~ Sex, data = exam_data_clean)
8 H2: North vs South
region_test <- exam_data_clean %>% mutate(North_South = ifelse(str_detect(Region, “North”), “North”, “South”)) %>% t.test(Candidates ~ North_South, data = .)
print(region_test)
- Technique 4: Correlation Analysis language-{r}#| label: correlation
yearly <- exam_data_clean %>% group_by(Year) %>% summarise(Total = sum(Candidates, na.rm = TRUE))
cor(yearly\(Year, yearly\)Total)
9 Regional correlation matrix
region_wide <- exam_data_clean %>% group_by(Region, Year) %>% summarise(Candidates = sum(Candidates, na.rm = TRUE), .groups = ‘drop’) %>% pivot_wider(names_from = Year, values_from = Candidates, values_fill = 0)
cor_matrix <- cor(region_wide[,-1], use = “complete.obs”) corrplot(cor_matrix, method = “color”, type = “upper”, tl.cex = 0.8)
- Technique 5: Linear Regression language-{r}#| label: regression
model_data <- exam_data_clean %>% filter(Candidates > 0)
model <- lm(log(Candidates + 1) ~ Year + Region + Sex, data = model_data)
summary(model)
10 Diagnostics
par(mfrow = c(2,2)) plot(model) par(mfrow = c(1,1))
11 Significant coefficients
tidy(model, conf.int = TRUE) %>% filter(p.value < 0.05)
- Integrated Findings All five techniques consistently show strong regional disparities in WAEC registration. Southern states dominate, with statistically significant differences confirmed by hypothesis testing and regression. The patterns are stable over time (high year-to-year correlation).
Recommendation: State ministries of education should launch a National Examination Equity Programme focusing on awareness campaigns, fee subsidies, and strategic expansion of examination centres in under-represented northern and rural areas. Recommendation: Implement a targeted National Examination Equity Programme with focus on Northern states.
- Limitations & Further Work
Data is aggregated at state level; individual-level factors (income, school quality, etc.) are missing. Future work: Include school type, urban/rural classification, and fee subsidy data.
References Adi, B. (2026). AI-powered business analytics. Lagos Business School.
Appendix: AI Usage Statement I used Grok (xAI) to help structure the Quarto document and fix rendering issues. All data analysis, statistical decisions, interpretations, and business recommendations are my own. text—