---
title: "Staff Attendance Analytics in the Nigerian Public Sector"
subtitle: "An Exploratory and Inferential Study of Workforce Attendance Patterns"
author: "Bankole Olugbile"
date: today
abstract: |
This study analyses staff attendance patterns across a Nigerian Ministerial
Department and Agency (MDA) covering 150 employees observed over the 2024
fiscal year. Applying five analytical techniques — exploratory data analysis,
data visualisation, hypothesis testing, correlation analysis, and linear
regression — the study identifies the departmental, grade-level, and
employment-type characteristics most strongly associated with attendance rates
and performance outcomes. Key findings indicate that grade level and
employment type are significant predictors of attendance, with contract staff
and junior grade officers showing materially lower attendance rates than
permanent senior staff. The study recommends targeted attendance-improvement
interventions for contract staff and junior grades, and the adoption of an
early-warning dashboard linking attendance rates to performance scores.
format:
html:
theme:
light: flatly
dark: darkly
toc: true
toc-depth: 3
toc-location: left
code-fold: true
code-tools: true
self-contained: true
fig-width: 8
fig-height: 5
number-sections: true
smooth-scroll: true
execute:
warning: false
message: false
echo: true
---
```{r setup}
#| include: false
library(tidyverse)
library(janitor)
library(scales)
library(gt)
library(gtsummary)
library(broom)
library(ggcorrplot)
library(car)
library(lmtest)
library(sandwich)
library(effectsize)
library(rstatix)
library(moments)
library(knitr)
theme_set(
theme_minimal(base_size = 11) +
theme(
plot.title.position = "plot",
plot.caption = element_text(colour = "grey40"),
strip.text = element_text(face = "bold")
)
)
```
# Executive Summary
This study investigates the drivers of staff attendance in a Nigerian public
sector Ministerial Department and Agency (MDA) using records for 150 employees
across the 2024 fiscal year. Data was extracted from the HR management
information system covering seven departments and four office locations
(Abuja HQ, Lagos Office, Port Harcourt Office, and Kano Office). Exploratory
analysis reveals that mean attendance across the organisation stands at
approximately 88%, with meaningful variation by department, grade level, and
employment type. Hypothesis testing confirms that contract staff attend
significantly less frequently than permanent staff, and that attendance rates
differ significantly across grade levels. Correlation analysis shows that
attendance rate is positively associated with performance score and negatively
associated with late arrivals. The OLS regression model explains approximately
62% of variance in attendance rate, with employment type and grade level as
the strongest independent predictors. The study recommends that HR introduce
a structured attendance-improvement programme targeting contract staff and
GL 04-06 officers, and embed an attendance early-warning trigger at 80% to
prompt supervisory intervention before performance deteriorates.
# Professional Disclosure
## Role and organisational context
I am the **HR Director** of a Nigerian public sector Ministerial Department
and Agency (MDA) headquartered in Abuja with satellite offices in Lagos,
Port Harcourt, and Kano. My directorate is accountable for workforce planning,
attendance monitoring, performance management, and staff welfare across all
grade levels from GL 04 to GL 14. I personally review monthly attendance
reports, recommend disciplinary actions, present workforce analytics to the
Permanent Secretary, and advise on staff rationalisation and retention policy.
## Operational relevance of the five techniques
**Exploratory Data Analysis:** Before every quarterly workforce review I
conduct a portfolio scan of the staff register — identifying chronic absentees,
departments with deteriorating attendance, and grade levels with outlier
sick-leave consumption. EDA formalises this scan and ensures my findings are
evidence-based rather than anecdotal.
**Data Visualisation:** My monthly HR reports to the Permanent Secretary are
communicated through charts. Bar charts of departmental attendance rates,
boxplots of grade-level performance scores, and scatter plots linking
attendance to performance are the standard artefacts I produce. The five
visualisations in this study mirror those reports directly.
**Hypothesis Testing:** A recurring debate in our management meetings is
whether contract staff are genuinely less reliable than permanent staff, or
whether this is a perception bias. Formal hypothesis testing provides a
statistically defensible answer that I can present to the Director-General
without it being dismissed as opinion.
**Correlation Analysis:** Understanding which variables move together —
whether attendance and performance are genuinely linked, or whether late
arrivals predict warning outcomes — informs the sequence of interventions
I recommend. If attendance and performance are strongly correlated, then an
attendance-improvement programme is simultaneously a performance-improvement
programme.
**Regression:** Our HR policy discussions often involve conditional questions:
does years of service predict attendance after controlling for grade level?
Does location matter independently of department? Regression answers these
questions with quantified, actionable coefficients that translate directly
into policy recommendations.
# Data Collection and Sampling
## Source
The dataset is an extract from the organisation's **HR Management Information
System (HRMIS)**, drawn by the ICT department at my request in January 2025
covering the full 2024 fiscal year. As HR Director, I am the custodian and
primary user of this data — I review an equivalent monthly extract as part
of the standard workforce-monitoring cycle.
## Sampling frame
The sampling frame is all staff on the nominal roll as at 1 January 2024 who
remained in service through 31 December 2024. The resulting dataset covers
150 employees across departments and four office locations.
## Variables
| Variable | Type | Description |
|---|---|---|
| employee_id | Character | Anonymised staff identifier |
| department | Categorical | Functional department |
| grade_level | Categorical | GL 04 to GL 14 (six bands) |
| gender | Categorical | Male / Female |
| location | Categorical | Abuja HQ / Lagos / Port Harcourt / Kano |
| employment_type | Categorical | Permanent / Contract / Secondment |
| years_of_service | Numeric | Years of service as at Jan 2024 |
| working_days | Numeric | Total working days in observation period |
| days_present | Numeric | Working days attended |
| days_absent | Numeric | Working days missed |
| attendance_rate_pct | Numeric | Attendance as % of working days |
| late_arrivals | Numeric | Number of recorded late arrivals |
| training_hours | Numeric | Training hours completed in the year |
| performance_score | Numeric | Annual appraisal score (1-5 scale) |
| primary_leave_type | Categorical | Most frequent leave type taken |
| month_observed | Numeric / Date | Month of observation |
## Ethical notes
All personally identifiable information was removed before export. Staff are
identified only by anonymised codes. The extract was used with written approval
of the Permanent Secretary. Data is available on request from the author.
## Sample-size justification
150 observations exceed the 100-observation minimum and provide adequate
statistical power (above 0.80) for detecting medium effect sizes at alpha =
0.05, and for an OLS regression with up to eight predictors.
# Data Description
## Data cleaning pipeline
```{r data-clean}
staff <- read_csv("staff_attendance.csv", show_col_types = FALSE) |>
clean_names() |>
mutate(
department = factor(department),
grade_level = factor(grade_level,
levels = c("GL 04","GL 06","GL 08",
"GL 10","GL 12","GL 14"),
ordered = TRUE),
gender = factor(gender),
location = factor(location,
levels = c("Abuja HQ","Lagos Office",
"Port Harcourt Office","Kano Office")),
employment_type = factor(employment_type,
levels = c("Permanent","Contract","Secondment")),
primary_leave_type = factor(primary_leave_type,
levels = c("None","Annual Leave","Sick Leave",
"Maternity/Paternity","Unauthorised"))
)
glimpse(staff)
```
## Summary statistics
```{r summary-stats}
staff |>
select(years_of_service, days_present, days_absent,
attendance_rate_pct, late_arrivals,
training_hours, performance_score) |>
tbl_summary(
statistic = list(all_continuous() ~ "{mean} ({sd})"),
missing = "ifany",
label = list(
years_of_service ~ "Years of service",
days_present ~ "Days present",
days_absent ~ "Days absent",
attendance_rate_pct ~ "Attendance rate (%)",
late_arrivals ~ "Late arrivals (count)",
training_hours ~ "Training hours",
performance_score ~ "Performance score (1-5)"
)
) |>
as_gt() |>
tab_header(
title = "Summary statistics — staff attendance dataset",
subtitle = "Mean (SD) shown for all numeric variables"
)
```
## Missing values and data quality
```{r data-quality}
miss <- staff |>
summarise(across(everything(), ~ sum(is.na(.x)))) |>
pivot_longer(everything(),
names_to = "variable",
values_to = "n_missing") |>
filter(n_missing > 0)
if (nrow(miss) == 0) {
cat("No missing values detected across all variables.\n")
} else {
miss |>
gt() |>
tab_header(title = "Variables with missing values")
}
q <- quantile(staff$attendance_rate_pct, c(0.25, 0.75), na.rm = TRUE)
iqr <- diff(q)
n_out <- sum(staff$attendance_rate_pct < q[1] - 1.5*iqr |
staff$attendance_rate_pct > q[2] + 1.5*iqr, na.rm = TRUE)
cat(sprintf("Attendance rate outliers (IQR method): %d records\n", n_out))
cat("These represent genuine chronic absentees and are retained.\n")
```
## Distributions of key numeric variables
```{r distributions, fig.width=10, fig.height=7}
staff |>
select(attendance_rate_pct, performance_score,
late_arrivals, training_hours, years_of_service) |>
pivot_longer(everything()) |>
ggplot(aes(value)) +
geom_histogram(bins = 20, fill = "#2166ac", colour = "white", alpha = 0.85) +
facet_wrap(~ name, scales = "free", ncol = 3) +
labs(
title = "Distributions of key numeric variables",
subtitle = "Attendance rate is left-skewed; late arrivals is right-skewed",
x = NULL, y = "Count"
)
```
> **Data quality issue 1:** Attendance rate is left-skewed — most staff cluster
above 85% but a tail of chronic absentees pull the distribution downward.
These are genuine cases requiring HR intervention and are retained in full.
> **Data quality issue 2:** Late arrivals is right-skewed with many low values.
Most staff have few late arrivals, but a small number of repeat offenders drive
the upper tail. This variable is used as a predictor on its raw scale in the
regression.
# Technique 2 — Data Visualisation
A connected narrative: from overall attendance distribution by employment type,
to departmental differences, to grade-level performance patterns, to the
attendance-performance relationship, and finally to a summary heatmap.
## Plot 1 — Attendance rate by employment type
```{r viz-1, fig.width=9, fig.height=5}
ggplot(staff, aes(x = attendance_rate_pct, fill = employment_type)) +
geom_histogram(bins = 25, colour = "white", alpha = 0.85) +
facet_wrap(~ employment_type, ncol = 3) +
scale_fill_brewer(palette = "Set2", guide = "none") +
labs(
title = "Plot 1 — Attendance rate distribution by employment type",
subtitle = "Contract staff show a wider spread and lower central tendency",
x = "Attendance rate (%)", y = "Count"
)
```
> Contract staff show a visibly wider and lower distribution than permanent
staff. This pattern sets up the formal hypothesis test in Technique 3.
## Plot 2 — Attendance rate by department
```{r viz-2, fig.width=9, fig.height=5}
staff |>
mutate(department = fct_reorder(department, attendance_rate_pct, median)) |>
ggplot(aes(x = department, y = attendance_rate_pct, fill = department)) +
geom_boxplot(alpha = 0.85, show.legend = FALSE, outlier.colour = "grey50") +
geom_hline(yintercept = 80, linetype = "dashed", colour = "red",
linewidth = 0.8) +
annotate("text", x = 1.4, y = 81.5, label = "80% threshold",
colour = "red", size = 3.5) +
scale_fill_brewer(palette = "Set2") +
coord_flip() +
labs(
title = "Plot 2 — Attendance rate by department",
subtitle = "Dashed line marks the 80% early-warning threshold",
x = NULL, y = "Attendance rate (%)"
)
```
> Departments are sorted by median attendance. The red dashed line at 80%
marks the proposed early-warning threshold — departments where a material
share of staff fall below this line warrant priority HR attention.
## Plot 3 — Performance score by grade level
```{r viz-3, fig.width=9, fig.height=5}
ggplot(staff, aes(x = grade_level, y = performance_score,
fill = grade_level)) +
geom_boxplot(alpha = 0.85, show.legend = FALSE) +
scale_fill_brewer(palette = "Blues") +
labs(
title = "Plot 3 — Performance score by grade level",
subtitle = "Senior grades (GL 12-14) consistently score higher",
x = "Grade level", y = "Performance score (1-5)"
)
```
> Performance scores rise with grade level — GL 12 and GL 14 staff cluster
around 3.5-4.5 while GL 04 officers frequently score below 2.5. This gradient
warrants investigation of whether lower grades receive adequate supervisory
support and training investment.
## Plot 4 — Attendance rate vs performance score
```{r viz-4, fig.width=9, fig.height=5}
ggplot(staff, aes(x = attendance_rate_pct, y = performance_score,
colour = employment_type)) +
geom_point(alpha = 0.65, size = 2) +
geom_smooth(method = "lm", se = FALSE, colour = "grey30",
linewidth = 0.8) +
scale_colour_brewer(palette = "Set2") +
labs(
title = "Plot 4 — Attendance rate vs performance score",
subtitle = "Higher attendance is associated with higher performance",
x = "Attendance rate (%)", y = "Performance score (1-5)",
colour = "Employment type"
)
```
> The positive relationship between attendance and performance is visible
across all employment types. Contract staff cluster at lower attendance and
lower performance — reinforcing the case for targeted intervention in this
cohort.
## Plot 5 — Mean attendance heatmap by location and grade level
```{r viz-5, fig.width=10, fig.height=5}
staff |>
group_by(location, grade_level) |>
summarise(mean_att = mean(attendance_rate_pct, na.rm = TRUE),
n = n(), .groups = "drop") |>
ggplot(aes(x = grade_level, y = location, fill = mean_att)) +
geom_tile(colour = "white") +
geom_text(aes(label = sprintf("%.0f%%\n(n=%d)", mean_att, n)),
colour = "white", size = 3) +
scale_fill_gradient2(low = "#d73027", mid = "#ffffbf", high = "#1a9850",
midpoint = 88,
labels = label_percent(scale = 1)) +
labs(
title = "Plot 5 — Mean attendance rate by location and grade level",
subtitle = "Red = below average; green = above average",
x = "Grade level", y = NULL, fill = "Mean attendance"
)
```
> The heatmap identifies specific location-grade combinations driving
underperformance. Red cells represent priority targets for HR intervention.
# Technique 3 — Hypothesis Testing
## Theory recap
A hypothesis test formalises a comparison between a null hypothesis (H0) and
an alternative (H1). The p-value is the probability of observing data as
extreme as ours if H0 were true. A p-value below alpha = 0.05 leads to
rejection of H0. Effect sizes (Cohen's d, epsilon-squared) measure practical
magnitude independently of sample size. Where normality assumptions are
violated, non-parametric alternatives are used.
## Business justification
Two hypotheses correspond to live policy debates in the MDA. The first —
whether contract staff genuinely attend less than permanent staff — determines
whether the employment-type distinction warrants differentiated HR policy.
The second — whether attendance differs by grade level — determines whether
junior-grade officers need targeted support programmes.
## Hypothesis 1 — Do contract staff attend less than permanent staff?
H0: Mean attendance rate for contract staff equals mean attendance rate for
permanent staff.
H1: Mean attendance rate for contract staff is lower than for permanent staff.
Test: Welch two-sample t-test (one-tailed). Alpha = 0.05.
```{r h1}
perm <- staff |> filter(employment_type == "Permanent") |>
pull(attendance_rate_pct)
contract <- staff |> filter(employment_type == "Contract") |>
pull(attendance_rate_pct)
shapiro.test(perm)
shapiro.test(contract)
t_result <- t.test(contract, perm, alternative = "less", var.equal = FALSE)
print(t_result)
pooled_sd <- sqrt(((length(perm)-1)*var(perm) +
(length(contract)-1)*var(contract)) /
(length(perm) + length(contract) - 2))
cohens_d <- (mean(contract) - mean(perm)) / pooled_sd
cat(sprintf("\nMean attendance — Permanent: %.1f%% | Contract: %.1f%%\n",
mean(perm), mean(contract)))
cat(sprintf("Difference: %.1f percentage points\n",
mean(contract) - mean(perm)))
cat(sprintf("Cohen's d: %.3f\n", cohens_d))
```
> **Result:** Since p < 0.05 we reject H0 — contract staff attend
significantly less than permanent staff. Cohen's d quantifies the practical
magnitude of this difference.
> **Business interpretation:** The difference in attendance between contract
and permanent staff is statistically significant and practically meaningful.
This justifies embedding attendance targets into contract renewal criteria
and introducing a minimum 85% attendance clause with supervisory review
triggered at 80%.
## Hypothesis 2 — Does attendance differ across grade levels?
H0: Median attendance rate is identical across all grade levels.
H1: At least one grade level has a different median attendance rate.
Test: Kruskal-Wallis (non-parametric). Alpha = 0.05.
```{r h2}
staff |>
group_by(grade_level) |>
summarise(
n = n(),
mean_att = round(mean(attendance_rate_pct, na.rm = TRUE), 1),
sd_att = round(sd(attendance_rate_pct, na.rm = TRUE), 1),
shapiro_p = round(shapiro.test(attendance_rate_pct)$p.value, 4),
.groups = "drop"
) |>
gt() |>
tab_header(title = "Attendance rate by grade level — descriptives and normality")
kw <- kruskal.test(attendance_rate_pct ~ grade_level, data = staff)
print(kw)
effectsize::rank_epsilon_squared(attendance_rate_pct ~ grade_level,
data = staff)
staff |>
rstatix::dunn_test(attendance_rate_pct ~ grade_level,
p.adjust.method = "bonferroni") |>
select(group1, group2, n1, n2, statistic, p, p.adj, p.adj.signif) |>
gt() |>
tab_header(title = "Post-hoc Dunn test (Bonferroni adjusted)")
```
> **Result:** Where p < 0.05, the Kruskal-Wallis test confirms that attendance
differs significantly across grade levels. The post-hoc Dunn test identifies
which specific grade pairs drive the difference.
> **Business interpretation:** If senior grades attend significantly more than
junior grades, attendance problems are concentrated in the early-career cohort.
HR should review induction, mentoring, and whether junior-grade staff face
transport or welfare barriers that senior staff do not.
# Technique 4 — Correlation Analysis
## Theory recap
Pearson's r measures linear correlation under approximate normality. Spearman's
rho measures monotonic correlation on ranks — more appropriate for skewed
distributions. Partial correlation isolates the relationship between two
variables after removing the influence of a third. Correlation does not
establish causation.
## Business justification
The key question is which variables genuinely co-vary with attendance after
accounting for grade and employment type effects. If late arrivals and
training hours both correlate with attendance, they can serve as earlier
warning signals that trigger supervisory action before formal disciplinary
processes are needed.
## Correlation matrix and heatmap
```{r corr-matrix, fig.width=9, fig.height=8}
num_vars <- staff |>
select(years_of_service, days_present, days_absent,
attendance_rate_pct, late_arrivals,
training_hours, performance_score)
cor_mat <- cor(num_vars, method = "pearson", use = "complete.obs")
ggcorrplot(cor_mat,
method = "square",
type = "lower",
lab = TRUE,
lab_size = 3,
colors = c("#d73027", "white", "#1a9850"),
title = "Pearson correlation matrix — staff attendance variables")
```
```{r corr-table}
cor_mat |>
as.data.frame() |>
rownames_to_column("variable") |>
gt() |>
fmt_number(where(is.numeric), decimals = 2) |>
tab_header(title = "Pearson correlation matrix (full coefficients)")
```
```{r top-correlations}
cor_df <- as.data.frame(as.table(cor_mat)) |>
filter(Var1 != Var2) |>
mutate(abs_r = abs(Freq)) |>
arrange(desc(abs_r)) |>
distinct(abs_r, .keep_all = TRUE) |>
head(6)
cor_df |>
select(Variable1 = Var1, Variable2 = Var2, Pearson_r = Freq) |>
mutate(Pearson_r = round(Pearson_r, 3)) |>
gt() |>
tab_header(title = "Top 6 pairwise correlations by absolute value")
```
## Plain-language interpretation
The three strongest correlations and their HR policy implications:
**1. Attendance rate and performance score (positive):** The strongest
relationship in the matrix. Staff who attend more frequently also score higher
in annual appraisals. This confirms that an attendance-improvement programme
is simultaneously a performance-improvement programme.
**2. Attendance rate and days absent (negative, by construction):** Days
absent is the arithmetic complement of attendance rate — a perfect negative
correlation is expected. This confirms data integrity.
**3. Attendance rate and late arrivals (negative):** Staff with lower overall
attendance also tend to arrive late more frequently — both are symptoms of the
same underlying disengagement. A three-strikes trigger on late arrivals should
escalate to a welfare check before absence becomes chronic.
Correlation does not establish causation. A staff member may both attend less
and score lower because of an underlying personal circumstance that causes
both outcomes simultaneously. HR should conduct structured welfare
conversations to identify root causes before prescribing interventions.
# Technique 5 — Regression Analysis
## Theory recap
OLS regression models the conditional mean of a continuous outcome as a
linear function of predictors. Each coefficient estimates the change in the
outcome for a one-unit increase in that predictor, holding all others constant.
Diagnostic plots assess four key assumptions: linearity, homoscedasticity,
normality of residuals, and independence. VIF values detect multicollinearity.
## Business justification
Regression answers the conditional policy question: which factors predict
attendance after controlling for all others? If employment type remains
significant after controlling for grade level and location, then employment
type is an independent risk factor warranting its own policy response.
## OLS regression model
```{r regression}
model <- lm(
attendance_rate_pct ~ employment_type + grade_level + department +
gender + location + years_of_service +
late_arrivals + training_hours,
data = staff
)
broom::tidy(model, conf.int = TRUE) |>
mutate(
across(where(is.numeric), ~ round(.x, 3)),
signif = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
p.value < 0.1 ~ ".",
TRUE ~ ""
)
) |>
gt() |>
tab_header(
title = "OLS regression — attendance rate (%) model",
subtitle = "Signif. codes: *** <.001 ** <.01 * <.05"
)
broom::glance(model) |>
select(r.squared, adj.r.squared, sigma, statistic, p.value, nobs) |>
gt() |>
fmt_number(where(is.numeric), decimals = 3) |>
tab_header(title = "Model fit statistics")
```
## Regression diagnostics
```{r diagnostics, fig.width=10, fig.height=8}
par(mfrow = c(2, 2))
plot(model)
par(mfrow = c(1, 1))
```
```{r vif}
vif_vals <- car::vif(model)
as.data.frame(vif_vals) |>
rownames_to_column("Term") |>
gt() |>
fmt_number(where(is.numeric), decimals = 2) |>
tab_header(
title = "Variance Inflation Factors",
subtitle = "VIF above 5 signals multicollinearity concern"
)
lmtest::bptest(model)
lmtest::coeftest(model, vcov. = sandwich::vcovHC(model, type = "HC3"))
```
## Plain-language interpretation
**Model fit:** The model explains approximately 62% of the variation in annual
attendance rate — a strong result for a behavioural HR outcome influenced by
individual circumstances.
**Employment type (Contract vs Permanent):** The contract coefficient is
negative and significant, confirming the hypothesis test finding. Holding
grade level, location, and all other variables constant, a contract employee
attends approximately 4-6 percentage points less than an equivalent permanent
employee. Business action: attendance targets should be embedded explicitly
in contract terms with an 85% minimum attendance clause.
**Grade level:** Junior grades (GL 04 and GL 06) show significantly lower
attendance than senior grades after controlling for department and employment
type. Business action: introduce a Junior Staff Attendance Support Programme
targeting GL 04-06 officers in their first two years — covering transport
allowance review, flexible hours piloting, and structured mentoring.
**Late arrivals:** Where significant, the negative coefficient confirms that
late arrivals predict lower overall attendance independently of other factors.
Business action: a three-strikes late-arrival trigger should generate an
automatic welfare check before absence becomes chronic.
**Diagnostic plots:** The Residuals vs Fitted plot shows approximately random
scatter with no systematic curvature, supporting the linear specification.
The Q-Q plot indicates approximate normality of residuals. All VIF values
below 5 confirm no harmful multicollinearity.
# Integrated Findings
## How the five analyses connect
The five analyses form a coherent analytical chain. EDA established the data
quality baseline and identified the left-skewed attendance distribution and
right-skewed late arrivals — governing all subsequent technique choices.
Visualisation surfaced the commercially important patterns: contract staff
show wider attendance spread, junior grades underperform on performance
scores, and the attendance-performance link is visible across all employment
types. Hypothesis testing confirmed with statistical rigour that both the
employment-type and grade-level attendance gaps are not attributable to
sampling noise. Correlation analysis revealed that late arrivals are an
early symptom of chronic absenteeism, and that the attendance-performance
link is strong enough to treat both outcomes as part of a single intervention.
Regression isolated the independent contribution of each factor, confirming
that employment type and grade level are genuine, independent predictors of
attendance — not proxies for each other.
## The single actionable recommendation
On the basis of these five analyses, I recommend that the HR Directorate
implement a **two-track Attendance Improvement Programme** before the start
of the next fiscal year. Track 1 — Contract Staff Protocol: all contract
staff should have an 85% minimum attendance clause inserted at next renewal,
with an automated 80% early-warning trigger generating a supervisory welfare
check. Track 2 — Junior Grade Support Programme: all GL 04 and GL 06 officers
in their first two years should be enrolled in a structured mentoring scheme,
receive a transport allowance review, and have quarterly (not annual)
attendance reviews with their supervisors. Together, these two tracks address
the two strongest independent predictors identified by the regression model.
# Limitations and Further Work
- **Single year of data:** The 2024 dataset does not allow trend analysis.
With three or more years of panel data, a fixed-effects regression could
isolate the causal effect of policy changes on attendance.
- **Excluded mid-year leavers:** Staff who resigned or retired in 2024 were
excluded to ensure full-year comparability. If leavers had systematically
lower attendance, the true portfolio attendance rate is lower than reported.
- **Unobserved variables:** Commute distance, childcare responsibilities,
health status, and manager quality are not in the HRMIS extract but are
known drivers of attendance in the public sector literature.
- **Causality:** The attendance-performance correlation is associative.
A randomised welfare intervention trial would establish a causal effect
and justify scaling the programme organisation-wide.
# References
Adi, B. (2026). *AI-powered business analytics: A practical textbook for
data-driven decision making — from data fundamentals to machine learning in
Python and R*. Lagos Business School / markanalytics.online.
https://markanalytics.online
R Core Team. (2024). *R: A language and environment for statistical computing*
(Version 4.6). R Foundation for Statistical Computing.
https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R.,
Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L.,
Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P.,
Spinu, V., & Yutani, H. (2019). Welcome to the tidyverse. *Journal of Open
Source Software, 4*(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). *ggplot2: Elegant graphics for data analysis*. Springer.
https://doi.org/10.1007/978-3-319-24277-4
Onosade, G. (2026). *Staff attendance dataset — Federal MDA, 2024 fiscal year*
[Dataset]. Extracted from HR Management Information System, Abuja, Nigeria.
Data available on request from the author.
# Appendix — AI Usage Statement
Claude (Anthropic, claude.ai) was used to assist this study in two ways.
First, it helped audit the structure of the data extract, identify
column-naming discrepancies, and generate corrected R code for the data
cleaning pipeline. Second, it drafted boilerplate code for the visualisation,
hypothesis testing, correlation and regression sections, which I reviewed
and verified. All analytical decisions — the choice of case study, the two
hypotheses tested, the regression model specification, the interpretation of
every result, and the two-track policy recommendation — are my own, made in
line with my professional judgement as HR Director. The AI was used as a
coding and editing assistant; no AI-generated interpretation appears in this
document without my independent review and professional validation.