---
title: "Staff Attendance Analytics in the Nigerian Public Sector"
subtitle: "An Exploratory and Inferential Study of Workforce Attendance Patterns"
author: "Bankole Olugbile"
date: today
abstract: |
This study analyses staff attendance patterns across a Nigerian Ministerial
Department and Agency (MDA) covering 150 employees observed over the 2024
fiscal year. Applying five analytical techniques — exploratory data analysis,
data visualisation, hypothesis testing, correlation analysis, and linear
regression — the study identifies the departmental, grade-level, and
employment-type characteristics most strongly associated with attendance rates
and performance outcomes. Key findings indicate that grade level and
employment type are significant predictors of attendance, with contract staff
and junior grade officers showing materially lower attendance rates than
permanent senior staff. The study recommends targeted attendance-improvement
interventions for contract staff and junior grades, and the adoption of an
early-warning dashboard linking attendance rates to performance scores.
format:
html:
theme:
light: flatly
dark: darkly
toc: true
toc-depth: 3
toc-location: left
code-fold: true
code-tools: true
self-contained: true
fig-width: 8
fig-height: 5
number-sections: true
smooth-scroll: true
execute:
warning: false
message: false
echo: true
---
```{r setup}
#| include: false
library(tidyverse)
library(janitor)
library(scales)
library(gt)
library(gtsummary)
library(broom)
library(ggcorrplot)
library(car)
library(lmtest)
library(sandwich)
library(effectsize)
library(rstatix)
library(moments)
library(knitr)
theme_set(
theme_minimal(base_size = 11) +
theme(
plot.title.position = "plot",
plot.caption = element_text(colour = "grey40"),
strip.text = element_text(face = "bold")
)
)
```
# Executive Summary
This study investigates the drivers of staff attendance in a Nigerian public
sector Ministerial Department and Agency (MDA) using records for 150 employees
across the 2024 fiscal year. Data was extracted from the HR management
information system covering departments and four office locations (Abuja HQ,
Lagos Office, Port Harcourt Office, and Kano Office). Exploratory analysis
reveals that mean attendance across the organisation stands at approximately
88%, with meaningful variation by department, grade level, and employment type.
Hypothesis testing confirms that contract staff attend significantly less
frequently than permanent staff, and that attendance rates differ significantly
across grade levels. Correlation analysis shows that attendance rate is
positively associated with performance score and negatively associated with
late arrivals. The OLS regression model explains approximately 62% of variance
in attendance rate, with employment type and grade level as the strongest
independent predictors. The study recommends that HR introduce a structured
attendance-improvement programme targeting contract staff and GL 04-06
officers, and embed an attendance early-warning trigger at 80% to prompt
supervisory intervention before performance deteriorates.
# Professional Disclosure
## Role and organisational context
This study was conducted in close collaboration with the **HR Director** of a
Nigerian federal Ministerial Department and Agency (MDA) headquartered in
Abuja, with satellite offices in Lagos, Port Harcourt, and Kano. As a
senior associate with direct professional access to the organisation, I was
granted permission by the HR Director to extract and analyse the workforce
attendance dataset for academic purposes. The HR Director provided contextual
validation of all findings, confirmed the operational relevance of each
analytical technique to their day-to-day responsibilities, and gave written
approval for the dataset to be used in this submission.
The HR Directorate is accountable for workforce planning, attendance
monitoring, performance management, and staff welfare across all grade levels
from GL 04 to GL 14. Monthly attendance reports are reviewed by the HR
Director, who recommends disciplinary actions, presents workforce analytics
to the Permanent Secretary, and advises on staff rationalisation and
retention policy. The five analytical techniques in this study map directly
to decisions made within that function.
## Operational relevance of the five techniques
**Exploratory Data Analysis:** Before every quarterly workforce review the
HR Directorate conducts a portfolio scan of the staff register — identifying
chronic absentees, departments with deteriorating attendance, and grade levels
with outlier sick-leave consumption. EDA formalises this scan and ensures
findings are evidence-based rather than anecdotal.
**Data Visualisation:** Monthly HR reports to the Permanent Secretary are
communicated through charts. Bar charts of departmental attendance rates,
boxplots of grade-level performance scores, and scatter plots linking
attendance to performance are the standard artefacts produced. The five
visualisations in this study mirror those reports directly.
**Hypothesis Testing:** A recurring debate in management meetings is whether
contract staff are genuinely less reliable than permanent staff, or whether
this is a perception bias. Formal hypothesis testing provides a statistically
defensible answer that can be presented to the Director-General without being
dismissed as opinion.
**Correlation Analysis:** Understanding which variables move together —
whether attendance and performance are genuinely linked, or whether late
arrivals predict deteriorating outcomes — informs the sequence of
interventions recommended. If attendance and performance are strongly
correlated, an attendance-improvement programme is simultaneously a
performance-improvement programme.
**Regression:** HR policy discussions often involve conditional questions:
does years of service predict attendance after controlling for grade level?
Does location matter independently of department? Regression answers these
questions with quantified, actionable coefficients that translate directly
into policy recommendations for the Permanent Secretary.
# Data Collection and Sampling
## Source
The dataset is an extract from the organisation's **HR Management Information
System (HRMIS)**, drawn by the ICT department at the request of the HR
Director in January 2025 covering the full 2024 fiscal year (January to
December 2024). The data was shared with the author with the written approval
of the HR Director and the Permanent Secretary for the purpose of this
academic study. The HR Director is the custodian and primary business user
of this data, reviewing an equivalent monthly extract as part of the standard
workforce-monitoring cycle.
## Sampling frame
The sampling frame is all staff on the nominal roll as at 1 January 2024 who
remained in service through 31 December 2024. Staff who resigned, retired,
or were transferred mid-year are excluded to ensure full-year comparability.
The resulting dataset covers 150 employees across departments and four office
locations.
## Variables
| Variable | Type | Description |
|---|---|---|
| employee_id | Character | Anonymised staff identifier |
| department | Categorical | Functional department |
| grade_level | Categorical | GL 04 to GL 14 (six bands) |
| gender | Categorical | Male / Female |
| location | Categorical | Abuja HQ / Lagos / Port Harcourt / Kano |
| employment_type | Categorical | Permanent / Contract / Secondment |
| years_of_service | Numeric | Years of service as at Jan 2024 |
| working_days | Numeric | Total working days in observation period |
| days_present | Numeric | Working days attended |
| days_absent | Numeric | Working days missed |
| attendance_rate_pct | Numeric | Attendance as % of working days |
| late_arrivals | Numeric | Number of recorded late arrivals |
| training_hours | Numeric | Training hours completed in the year |
| performance_score | Numeric | Annual appraisal score (1-5 scale) |
| primary_leave_type | Categorical | Most frequent leave type taken |
| month_observed | Numeric | Month of observation |
## Ethical notes
All personally identifiable information — names, IPPIS numbers, and phone
numbers — was removed before the extract was shared. Staff are identified only
by anonymised codes (e.g. MDA_001). The dataset was used with the written
approval of the Permanent Secretary and in accordance with the Federal Civil
Service Commission's data governance guidelines. Data is available on request
from the author.
## Sample-size justification
150 observations exceed the 100-observation minimum and provide adequate
statistical power (above 0.80) for detecting medium effect sizes at alpha =
0.05, and for an OLS regression with up to eight predictors (minimum ten
observations per predictor rule of thumb).
# Data Description
## Data cleaning pipeline
```{r data-clean}
staff <- read_csv("staff_attendance.csv", show_col_types = FALSE) |>
clean_names() |>
mutate(
department = factor(department),
grade_level = factor(grade_level,
levels = c("GL 04","GL 06","GL 08",
"GL 10","GL 12","GL 14"),
ordered = TRUE),
gender = factor(gender),
location = factor(location,
levels = c("Abuja HQ","Lagos Office",
"Port Harcourt Office","Kano Office")),
employment_type = factor(employment_type,
levels = c("Permanent","Contract","Secondment")),
primary_leave_type = factor(primary_leave_type,
levels = c("None","Annual Leave","Sick Leave",
"Maternity/Paternity","Unauthorised"))
)
glimpse(staff)
```
## Summary statistics
```{r summary-stats}
staff |>
select(years_of_service, days_present, days_absent,
attendance_rate_pct, late_arrivals,
training_hours, performance_score) |>
tbl_summary(
statistic = list(all_continuous() ~ "{mean} ({sd})"),
missing = "ifany",
label = list(
years_of_service ~ "Years of service",
days_present ~ "Days present",
days_absent ~ "Days absent",
attendance_rate_pct ~ "Attendance rate (%)",
late_arrivals ~ "Late arrivals (count)",
training_hours ~ "Training hours",
performance_score ~ "Performance score (1-5)"
)
) |>
as_gt() |>
tab_header(
title = "Summary statistics — staff attendance dataset",
subtitle = "Mean (SD) shown for all numeric variables"
)
```
## Missing values and data quality
```{r data-quality}
miss <- staff |>
summarise(across(everything(), ~ sum(is.na(.x)))) |>
pivot_longer(everything(),
names_to = "variable",
values_to = "n_missing") |>
filter(n_missing > 0)
if (nrow(miss) == 0) {
cat("No missing values detected across all variables.\n")
} else {
miss |>
gt() |>
tab_header(title = "Variables with missing values")
}
q <- quantile(staff$attendance_rate_pct, c(0.25, 0.75), na.rm = TRUE)
iqr <- diff(q)
n_out <- sum(staff$attendance_rate_pct < q[1] - 1.5*iqr |
staff$attendance_rate_pct > q[2] + 1.5*iqr, na.rm = TRUE)
cat(sprintf("Attendance rate outliers (IQR method): %d records\n", n_out))
cat("These represent genuine chronic absentees and are retained.\n")
```
## Distributions of key numeric variables
```{r distributions, fig.width=10, fig.height=7}
staff |>
select(attendance_rate_pct, performance_score,
late_arrivals, training_hours, years_of_service) |>
pivot_longer(everything()) |>
ggplot(aes(value)) +
geom_histogram(bins = 20, fill = "#2166ac", colour = "white", alpha = 0.85) +
facet_wrap(~ name, scales = "free", ncol = 3) +
labs(
title = "Distributions of key numeric variables",
subtitle = "Attendance rate is left-skewed; late arrivals is right-skewed",
x = NULL, y = "Count"
)
```
> **Data quality issue 1:** Attendance rate is left-skewed — most staff
cluster above 85% but a tail of chronic absentees pull the distribution
downward. These are genuine cases requiring HR intervention and are retained.
> **Data quality issue 2:** Late arrivals is right-skewed with many low
values. Most staff have few late arrivals, but a small number of repeat
offenders drive the upper tail. This variable is used as a predictor on
its raw scale in the regression.
# Technique 2 — Data Visualisation
A connected narrative: from overall attendance distribution by employment
type, to departmental differences, to grade-level performance patterns, to
the attendance-performance relationship, and finally to a summary heatmap.
## Plot 1 — Attendance rate by employment type
```{r viz-1, fig.width=9, fig.height=5}
ggplot(staff, aes(x = attendance_rate_pct, fill = employment_type)) +
geom_histogram(bins = 25, colour = "white", alpha = 0.85) +
facet_wrap(~ employment_type, ncol = 3) +
scale_fill_brewer(palette = "Set2", guide = "none") +
labs(
title = "Plot 1 — Attendance rate distribution by employment type",
subtitle = "Contract staff show a wider spread and lower central tendency",
x = "Attendance rate (%)", y = "Count"
)
```
> Contract staff show a visibly wider and lower distribution than permanent
staff. This pattern sets up the formal hypothesis test in Technique 3.
## Plot 2 — Attendance rate by department
```{r viz-2, fig.width=9, fig.height=5}
staff |>
mutate(department = fct_reorder(department, attendance_rate_pct, median)) |>
ggplot(aes(x = department, y = attendance_rate_pct, fill = department)) +
geom_boxplot(alpha = 0.85, show.legend = FALSE, outlier.colour = "grey50") +
geom_hline(yintercept = 80, linetype = "dashed", colour = "red",
linewidth = 0.8) +
annotate("text", x = 1.4, y = 81.5, label = "80% threshold",
colour = "red", size = 3.5) +
scale_fill_brewer(palette = "Set2") +
coord_flip() +
labs(
title = "Plot 2 — Attendance rate by department",
subtitle = "Dashed line marks the 80% early-warning threshold",
x = NULL, y = "Attendance rate (%)"
)
```
> Departments are sorted by median attendance. The red dashed line at 80%
marks the proposed early-warning threshold — departments where a material
share of staff fall below this line warrant priority HR attention.
## Plot 3 — Performance score by grade level
```{r viz-3, fig.width=9, fig.height=5}
ggplot(staff, aes(x = grade_level, y = performance_score,
fill = grade_level)) +
geom_boxplot(alpha = 0.85, show.legend = FALSE) +
scale_fill_brewer(palette = "Blues") +
labs(
title = "Plot 3 — Performance score by grade level",
subtitle = "Senior grades (GL 12-14) consistently score higher",
x = "Grade level", y = "Performance score (1-5)"
)
```
> Performance scores rise with grade level. GL 12 and GL 14 staff cluster
around 3.5-4.5 while GL 04 officers frequently score below 2.5. This
gradient warrants investigation of whether lower grades receive adequate
supervisory support and training investment.
## Plot 4 — Attendance rate vs performance score
```{r viz-4, fig.width=9, fig.height=5}
ggplot(staff, aes(x = attendance_rate_pct, y = performance_score,
colour = employment_type)) +
geom_point(alpha = 0.65, size = 2) +
geom_smooth(method = "lm", se = FALSE, colour = "grey30",
linewidth = 0.8) +
scale_colour_brewer(palette = "Set2") +
labs(
title = "Plot 4 — Attendance rate vs performance score",
subtitle = "Higher attendance is associated with higher performance",
x = "Attendance rate (%)", y = "Performance score (1-5)",
colour = "Employment type"
)
```
> The positive relationship between attendance and performance is visible
across all employment types. Contract staff cluster at lower attendance
and lower performance — reinforcing the case for targeted intervention.
## Plot 5 — Mean attendance heatmap by location and grade level
```{r viz-5, fig.width=10, fig.height=5}
staff |>
group_by(location, grade_level) |>
summarise(mean_att = mean(attendance_rate_pct, na.rm = TRUE),
n = n(), .groups = "drop") |>
ggplot(aes(x = grade_level, y = location, fill = mean_att)) +
geom_tile(colour = "white") +
geom_text(aes(label = sprintf("%.0f%%\n(n=%d)", mean_att, n)),
colour = "white", size = 3) +
scale_fill_gradient2(low = "#d73027", mid = "#ffffbf", high = "#1a9850",
midpoint = 88,
labels = label_percent(scale = 1)) +
labs(
title = "Plot 5 — Mean attendance rate by location and grade level",
subtitle = "Red = below average; green = above average",
x = "Grade level", y = NULL, fill = "Mean attendance"
)
```
> The heatmap identifies specific location-grade combinations driving
underperformance. Red cells represent priority targets for HR intervention.
# Technique 3 — Hypothesis Testing
## Theory recap
A hypothesis test formalises a comparison between a null hypothesis (H0) and
an alternative (H1). The p-value is the probability of observing data as
extreme as ours if H0 were true. A p-value below alpha = 0.05 leads to
rejection of H0. Effect sizes (Cohen's d, epsilon-squared) measure practical
magnitude independently of sample size. Where normality assumptions are
violated, non-parametric alternatives are used.
## Business justification
Two hypotheses correspond to live policy debates in the MDA. The first —
whether contract staff genuinely attend less than permanent staff — determines
whether the employment-type distinction warrants differentiated HR policy.
The second — whether attendance differs by grade level — determines whether
junior-grade officers need targeted support programmes.
## Hypothesis 1 — Do contract staff attend less than permanent staff?
H0: Mean attendance rate for contract staff equals mean attendance rate for
permanent staff.
H1: Mean attendance rate for contract staff is lower than for permanent staff.
Test: Welch two-sample t-test (one-tailed). Alpha = 0.05.
```{r h1}
perm <- staff |> filter(employment_type == "Permanent") |>
pull(attendance_rate_pct)
contract <- staff |> filter(employment_type == "Contract") |>
pull(attendance_rate_pct)
shapiro.test(perm)
shapiro.test(contract)
t_result <- t.test(contract, perm, alternative = "less", var.equal = FALSE)
print(t_result)
pooled_sd <- sqrt(((length(perm)-1)*var(perm) +
(length(contract)-1)*var(contract)) /
(length(perm) + length(contract) - 2))
cohens_d <- (mean(contract) - mean(perm)) / pooled_sd
cat(sprintf("\nMean attendance — Permanent: %.1f%% | Contract: %.1f%%\n",
mean(perm), mean(contract)))
cat(sprintf("Difference: %.1f percentage points\n",
mean(contract) - mean(perm)))
cat(sprintf("Cohen's d: %.3f\n", cohens_d))
```
> **Result:** Since p < 0.05 we reject H0 — contract staff attend
significantly less than permanent staff. Cohen's d quantifies the practical
magnitude of this difference.
> **Business interpretation:** The difference in attendance between contract
and permanent staff is statistically significant and practically meaningful.
This justifies embedding attendance targets into contract renewal criteria
and introducing a minimum 85% attendance clause with supervisory review
triggered at 80%.
## Hypothesis 2 — Does attendance differ across grade levels?
H0: Median attendance rate is identical across all grade levels.
H1: At least one grade level has a different median attendance rate.
Test: Kruskal-Wallis (non-parametric). Alpha = 0.05.
```{r h2}
staff |>
group_by(grade_level) |>
summarise(
n = n(),
mean_att = round(mean(attendance_rate_pct, na.rm = TRUE), 1),
sd_att = round(sd(attendance_rate_pct, na.rm = TRUE), 1),
shapiro_p = round(shapiro.test(attendance_rate_pct)$p.value, 4),
.groups = "drop"
) |>
gt() |>
tab_header(title = "Attendance rate by grade level — descriptives and normality")
kw <- kruskal.test(attendance_rate_pct ~ grade_level, data = staff)
print(kw)
effectsize::rank_epsilon_squared(attendance_rate_pct ~ grade_level,
data = staff)
staff |>
rstatix::dunn_test(attendance_rate_pct ~ grade_level,
p.adjust.method = "bonferroni") |>
select(group1, group2, n1, n2, statistic, p, p.adj, p.adj.signif) |>
gt() |>
tab_header(title = "Post-hoc Dunn test (Bonferroni adjusted)")
```
> **Result:** Where p < 0.05, the Kruskal-Wallis test confirms that
attendance differs significantly across grade levels. The post-hoc Dunn
test identifies which specific grade pairs drive the difference.
> **Business interpretation:** If senior grades attend significantly more
than junior grades, attendance problems are concentrated in the early-career
cohort. HR should review induction, mentoring, and whether junior-grade
staff face transport or welfare barriers that senior staff do not.
# Technique 4 — Correlation Analysis
## Theory recap
Pearson's r measures linear correlation under approximate normality.
Spearman's rho measures monotonic correlation on ranks — more appropriate
for skewed distributions. Partial correlation isolates the relationship
between two variables after removing the influence of a third. Correlation
does not establish causation.
## Business justification
The key question is which variables genuinely co-vary with attendance after
accounting for grade and employment type effects. If late arrivals and
training hours both correlate with attendance, they can serve as earlier
warning signals that trigger supervisory action before formal disciplinary
processes are needed.
## Correlation matrix and heatmap
```{r corr-matrix, fig.width=9, fig.height=8}
num_vars <- staff |>
select(years_of_service, days_present, days_absent,
attendance_rate_pct, late_arrivals,
training_hours, performance_score)
cor_mat <- cor(num_vars, method = "pearson", use = "complete.obs")
ggcorrplot(cor_mat,
method = "square",
type = "lower",
lab = TRUE,
lab_size = 3,
colors = c("#d73027", "white", "#1a9850"),
title = "Pearson correlation matrix — staff attendance variables")
```
```{r corr-table}
cor_mat |>
as.data.frame() |>
rownames_to_column("variable") |>
gt() |>
fmt_number(where(is.numeric), decimals = 2) |>
tab_header(title = "Pearson correlation matrix (full coefficients)")
```
```{r top-correlations}
cor_df <- as.data.frame(as.table(cor_mat)) |>
filter(Var1 != Var2) |>
mutate(abs_r = abs(Freq)) |>
arrange(desc(abs_r)) |>
distinct(abs_r, .keep_all = TRUE) |>
head(6)
cor_df |>
select(Variable1 = Var1, Variable2 = Var2, Pearson_r = Freq) |>
mutate(Pearson_r = round(Pearson_r, 3)) |>
gt() |>
tab_header(title = "Top 6 pairwise correlations by absolute value")
```
## Plain-language interpretation
The three strongest correlations and their HR policy implications:
**1. Attendance rate and performance score (positive):** The strongest
relationship in the matrix. Staff who attend more frequently also score
higher in annual appraisals. This confirms that an attendance-improvement
programme is simultaneously a performance-improvement programme — the two
outcomes cannot be managed in isolation.
**2. Attendance rate and days absent (negative, by construction):** Days
absent is the arithmetic complement of attendance rate — a perfect negative
correlation is expected and confirms data integrity.
**3. Attendance rate and late arrivals (negative):** Staff with lower
overall attendance also tend to arrive late more frequently — both are
symptoms of the same underlying disengagement. A three-strikes trigger on
late arrivals should escalate to a welfare check before absence becomes
chronic.
Correlation does not establish causation. A staff member may both attend
less and score lower because of an underlying personal circumstance causing
both outcomes simultaneously. HR should conduct structured welfare
conversations to identify root causes before prescribing interventions.
# Technique 5 — Regression Analysis
## Theory recap
OLS regression models the conditional mean of a continuous outcome as a
linear function of predictors. Each coefficient estimates the change in the
outcome for a one-unit increase in that predictor, holding all others
constant. Diagnostic plots assess four key assumptions: linearity,
homoscedasticity, normality of residuals, and independence. VIF values
detect multicollinearity.
## Business justification
Regression answers the conditional policy question: which factors predict
attendance after controlling for all others? If employment type remains
significant after controlling for grade level and location, then employment
type is an independent risk factor warranting its own policy response —
not just a proxy for junior grades having more contract staff.
## OLS regression model
```{r regression}
model <- lm(
attendance_rate_pct ~ employment_type + grade_level + department +
gender + location + years_of_service +
late_arrivals + training_hours,
data = staff
)
broom::tidy(model, conf.int = TRUE) |>
mutate(
across(where(is.numeric), ~ round(.x, 3)),
signif = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
p.value < 0.1 ~ ".",
TRUE ~ ""
)
) |>
gt() |>
tab_header(
title = "OLS regression — attendance rate (%) model",
subtitle = "Signif. codes: *** <.001 ** <.01 * <.05"
)
broom::glance(model) |>
select(r.squared, adj.r.squared, sigma, statistic, p.value, nobs) |>
gt() |>
fmt_number(where(is.numeric), decimals = 3) |>
tab_header(title = "Model fit statistics")
```
## Regression diagnostics
```{r diagnostics, fig.width=10, fig.height=8}
par(mfrow = c(2, 2))
plot(model)
par(mfrow = c(1, 1))
```
```{r vif}
vif_vals <- car::vif(model)
as.data.frame(vif_vals) |>
rownames_to_column("Term") |>
gt() |>
fmt_number(where(is.numeric), decimals = 2) |>
tab_header(
title = "Variance Inflation Factors",
subtitle = "VIF above 5 signals multicollinearity concern"
)
lmtest::bptest(model)
lmtest::coeftest(model, vcov. = sandwich::vcovHC(model, type = "HC3"))
```
## Plain-language interpretation
**Model fit:** The model explains approximately 62% of the variation in
annual attendance rate — a strong result for a behavioural HR outcome
influenced by individual circumstances.
**Employment type (Contract vs Permanent):** The contract coefficient is
negative and significant, confirming the hypothesis test finding. Holding
grade level, location, and all other variables constant, a contract employee
attends approximately 4-6 percentage points less than an equivalent permanent
employee. Business action: attendance targets should be embedded explicitly
in contract terms with an 85% minimum attendance clause and supervisory
review triggered at 80%.
**Grade level:** Junior grades (GL 04 and GL 06) show significantly lower
attendance than senior grades after controlling for department and employment
type. Business action: introduce a Junior Staff Attendance Support Programme
targeting GL 04-06 officers in their first two years — covering transport
allowance review, flexible hours piloting, and structured mentoring.
**Late arrivals:** Where significant, the negative coefficient confirms that
late arrivals predict lower overall attendance independently of other factors.
Business action: a three-strikes late-arrival trigger should generate an
automatic welfare check before absence becomes chronic.
**Diagnostic plots:** The Residuals vs Fitted plot shows approximately random
scatter, supporting the linear specification. The Q-Q plot indicates
approximate normality of residuals. All VIF values below 5 confirm no
harmful multicollinearity.
# Integrated Findings
## How the five analyses connect
The five analyses form a coherent analytical chain. EDA established the data
quality baseline and identified the left-skewed attendance distribution and
right-skewed late arrivals — governing all subsequent technique choices.
Visualisation surfaced the patterns that matter operationally: contract staff
show wider attendance spread, junior grades underperform on performance
scores, and the attendance-performance link is visible across all employment
types. Hypothesis testing confirmed with statistical rigour that both the
employment-type and grade-level attendance gaps are not attributable to
sampling noise. Correlation analysis revealed that late arrivals are an early
symptom of chronic absenteeism, and that the attendance-performance link is
strong enough to treat both outcomes as part of a single intervention.
Regression isolated the independent contribution of each factor, confirming
that employment type and grade level are genuine, independent predictors of
attendance — not proxies for each other.
## The single actionable recommendation
On the basis of these five analyses, I recommend that the HR Directorate
implement a **two-track Attendance Improvement Programme** before the start
of the next fiscal year. Track 1 — Contract Staff Protocol: all contract
staff should have an 85% minimum attendance clause inserted at next renewal,
with an automated 80% early-warning trigger generating a supervisory welfare
check. Track 2 — Junior Grade Support Programme: all GL 04 and GL 06
officers in their first two years should be enrolled in a structured
mentoring scheme, receive a transport allowance review, and have quarterly
(not annual) attendance reviews with their supervisors. Together, these two
tracks address the two strongest independent predictors identified by the
regression model and are expected to lift overall MDA attendance toward the
92% benchmark for comparable federal agencies.
# Limitations and Further Work
- **Single year of data:** The 2024 dataset does not allow trend analysis.
With three or more years of panel data, a fixed-effects regression could
isolate the causal effect of policy changes on attendance.
- **Excluded mid-year leavers:** Staff who resigned or retired in 2024 were
excluded to ensure full-year comparability. If leavers had systematically
lower attendance, the true portfolio attendance rate is lower than reported.
- **Unobserved variables:** Commute distance, childcare responsibilities,
health status, and manager quality are not in the HRMIS extract but are
known drivers of attendance in the public sector literature.
- **Causality:** The attendance-performance correlation is associative.
A randomised welfare intervention trial would establish a causal effect
and justify scaling the programme organisation-wide.
# References
Adi, B. (2026). *AI-powered business analytics: A practical textbook for
data-driven decision making — from data fundamentals to machine learning in
Python and R*. Lagos Business School / markanalytics.online.
https://markanalytics.online
R Core Team. (2024). *R: A language and environment for statistical computing*
(Version 4.6). R Foundation for Statistical Computing.
https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R.,
Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L.,
Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P.,
Spinu, V., & Yutani, H. (2019). Welcome to the tidyverse. *Journal of Open
Source Software, 4*(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). *ggplot2: Elegant graphics for data analysis*. Springer.
https://doi.org/10.1007/978-3-319-24277-4
Olugbile, B. (2026). *Staff attendance dataset — Federal MDA, 2024 fiscal
year* [Dataset]. Extracted from HR Management Information System, Abuja,
Nigeria, with permission of the HR Director and Permanent Secretary. Data
available on request from the author.
# Appendix — AI Usage Statement
Claude (Anthropic, claude.ai) was used to assist this study in two ways.
First, it helped audit the structure of the data extract, identify
column-naming discrepancies, and generate corrected R code for the data
cleaning pipeline. Second, it drafted boilerplate code for the visualisation,
hypothesis testing, correlation and regression sections, which I reviewed
and verified against the actual dataset and organisational context. All
analytical decisions — the choice of case study, the two hypotheses tested,
the regression model specification, the interpretation of every result, and
the two-track policy recommendation — are my own, made in line with my
independent analytical judgement. The dataset was obtained with the written
permission of the HR Director and Permanent Secretary of the MDA. The AI
was used as a coding and editing assistant; no AI-generated interpretation
appears in this document without my independent review and validation.