Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'ggplot2' was built under R version 4.5.3
Warning: package 'tibble' was built under R version 4.5.3
Warning: package 'tidyr' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'purrr' was built under R version 4.5.3
Warning: package 'dplyr' was built under R version 4.5.3
Warning: package 'stringr' was built under R version 4.5.3
Warning: package 'forcats' was built under R version 4.5.3
Warning: package 'lubridate' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(readxl)
Warning: package 'readxl' was built under R version 4.5.3
Code
library(knitr)
Warning: package 'knitr' was built under R version 4.5.3
Code
library(scales)
Warning: package 'scales' was built under R version 4.5.3
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
Code
library(broom)
Warning: package 'broom' was built under R version 4.5.3
Code
options(scipen =999, digits =4)
1. Executive Summary
This study presents an exploratory and inferential analysis of annual staff performance appraisal records for 852 employees of the Kwara State Internal Revenue Service (KWIRS) spanning the 2022–2025 assessment cycles. As Director of Administration and Operations, the central business problem is understanding what organisational and historical factors explain variation in 2025 performance scores, and whether gaps across grade bands and office locations are statistically real or attributable to chance.
Five techniques were applied sequentially. Exploratory Data Analysis uncovered two data quality issues: performance columns stored as character strings requiring numeric conversion, and a near-defunct 360-degree appraisal component where approximately 90% of staff carry zero scores. Data visualisation revealed a persistent performance gap between Zonal Offices and HQ Directorates across all four years, and notable differences across grade bands. Hypothesis testing confirmed both the grade-band effect (Kruskal-Wallis, p < 0.001) and the HQ–Zonal gap (Welch t-test, p = 0.002) are statistically significant, though with small effect sizes. Correlation analysis revealed that year-to-year performance consistency at KWIRS is surprisingly weak (r = 0.13–0.39), raising questions about the reliability of the appraisal instrument. Regression confirmed that prior-year performance and Zonal Office location are the two significant predictors of 2025 scores, while grade band (Management vs Revenue Staff) is not significant once prior performance is controlled.
The central recommendation is that KWIRS prioritise a calibration review of its appraisal process — targeting HOD scoring consistency, mandatory 360-degree completion, and targeted resource investment in Zonal Offices.
2. Professional Disclosure
Name: La-Kadri Yusuf
Job Title: Director, Administration and Operations
Organisation: Kwara State Internal Revenue Service (KWIRS)
Sector: Public Sector — State Tax Administration, Nigeria
Technique Justifications
Exploratory Data Analysis: As Director of Admin and Operations, I am responsible for the integrity of all staff records, including the annual appraisal dataset. Before any HR policy decision — promotions, PIPs, or training allocations — the underlying data must be clean and structurally understood. EDA is therefore the essential first step: it surfaces encoding errors, missing values, outliers, and distributional anomalies that, left unaddressed, would invalidate all downstream analyses. In a public-sector context where appraisal scores bear directly on staff careers, data quality is a compliance matter as much as an analytical one.
Data Visualisation: KWIRS performance reports are presented quarterly to the Executive Committee. Translating numerical scores into clear charts enables management to rapidly identify which directorates or grades are underperforming, track longitudinal trends, and prioritise training resources — without requiring recipients to interpret raw tables. Choosing the right chart type for the right audience is a core professional competence I apply regularly.
Hypothesis Testing: KWIRS operates across three Zonal Offices and eleven HQ Directorates. Observed performance differences between units may reflect genuine structural disparities or natural sampling variation. Before recommending differential interventions — additional supervision or targeted training for Zonal staff — I need statistical evidence that observed differences are real. Hypothesis testing provides precisely this: a principled framework for making decisions under uncertainty.
Correlation Analysis: The 2025 appraisal instrument incorporates four components: Quantitative Assessment, Qualitative Evaluation, Examinations, and 360-Degree Feedback. If these components are highly intercorrelated, the instrument may be redundant. Conversely, low correlations confirm each component captures a distinct performance dimension. Additionally, weak year-to-year correlations in individual scores have direct implications for the appraisal system’s predictive validity and fairness.
Linear Regression: The most actionable HR planning question is: can we predict which staff are at risk of underperforming in the coming cycle? Regression quantifies how strongly prior-year performance, grade band, and office location jointly determine current-year scores. The resulting coefficients translate directly into policy levers — for instance, a significant negative coefficient on Zonal Office location signals a structural (not individual) problem warranting systemic, not behavioural, intervention.
3. Data Collection and Sampling
Source and Collection Method
The dataset comprises annual performance appraisal records maintained by the Human Resources unit of the Kwara State Internal Revenue Service. Data was extracted from the KWIRS Human Resources Information System (HRIS) under the authority of the Director of Administration and Operations. Each record corresponds to a single staff member’s appraisal covering assessment cycles from January 2022 to December 2024 (with 2025 representing appraisal of 2024 activities, finalised in Q1 2025). Component-level scores for the 2025 cycle — Quantitative Assessment (up to 60 marks), Qualitative Evaluation (20 marks), Examination (10–20 marks depending on grade), and 360-Degree Feedback (20 marks) — were recorded by Heads of Department (HODs), validated by Zonal Coordinators, reviewed by the Director, and approved by HR.
Sampling Frame and Sample Size
This is a census of the entire active KWIRS staff population: all 852 permanent and contract employees in active employment for at least one complete appraisal cycle are included. Staff on long-term medical leave, on secondment to other agencies, or serving probation were excluded from the HRIS extract. Because the dataset covers the full population rather than a random sample, inferential tests are applied to assess whether observed patterns reflect systematic organisational factors rather than to estimate unknown population parameters.
Time Period Covered
The dataset spans four appraisal years: 2022, 2023, 2024, and 2025 (reflecting performance in calendar years 2021–2024 respectively). The 2025 cycle was finalised in Q1 2025. A subset of staff — primarily recent joiners — have incomplete records for earlier years.
Ethical Considerations
All personally identifiable information has been anonymised. Staff numbers are masked and no names appear in any output. Scores are analysed in aggregate or at directorate/grade level only. The data extraction was conducted in the course of official HR duties, consistent with KWIRS internal data governance policy. No external ethics approval was required for this internal institutional dataset used for educational research purposes.
Table 1. Summary statistics — all numeric performance variables
Variable
n
Missing
Mean
SD
Min
Q1
Median
Q3
Max
360-Degree (2025)
832
20
0.66
2.99
0.00
0.00
0.00
0.00
16.60
Exam (2025)
803
49
8.98
2.56
0.00
7.60
9.20
10.80
14.80
2022 Performance
743
109
79.07
6.52
49.45
75.17
80.06
83.35
94.04
2023 Performance
827
25
81.05
8.54
45.61
77.34
82.99
86.95
96.19
2024 Performance
852
0
82.95
7.01
42.51
79.58
84.18
87.48
97.60
2025 Performance
852
0
80.66
6.68
51.75
77.84
81.67
85.26
92.80
Qualitative (2025)
852
0
17.08
1.49
0.00
16.40
17.20
18.00
20.00
Quantitative (2025)
852
0
53.61
5.98
23.75
51.03
53.86
59.67
60.00
4.3 Data Quality Issues
Issue 1 — Character-encoded numeric columns. The columns 2022 Performance, 2023 Performance, and Exam were stored as character strings in the source Excel file rather than numbers. This prevents arithmetic operations and would silently produce NA values in any downstream analysis that assumes numeric input. The preparation code above applies as.numeric() to resolve this. An additional consequence is that 109 staff have no 2022 performance record and 25 have no 2023 record — these are principally recent joiners who were not yet employed during those cycles, not encoding errors.
Table 2. Missing values in numeric performance columns
Variable
Missing Records
% Missing
2022 Performance
109
12.8
Exam (2025)
49
5.8
2023 Performance
25
2.9
360-Degree (2025)
20
2.3
2024 Performance
0
0.0
Quantitative (2025)
0
0.0
Qualitative (2025)
0
0.0
2025 Performance
0
0.0
Average Score (All Years)
0
0.0
Issue 2 — Dysfunctional 360-Degree component. The 360-Degree column carries 20 NA values and a very large proportion of exact zero scores. Since perf_2025 = quant + qual + exam + deg360 by construction, every staff member with a zero 360 score loses up to 20 marks from their total — a structural penalty that has no bearing on their actual work performance.
Code
#| label: deg360-qualitytibble(Category =c("Score > 0 (review completed)","Score = 0 (likely non-participation)","Missing (NA)" ),Count =c(sum(!is.na(df$deg360) & df$deg360 >0),sum(!is.na(df$deg360) & df$deg360 ==0),sum(is.na(df$deg360)) )) |>mutate(Pct =round(Count /nrow(df) *100, 1)) |>kable(caption ="Table 3. Distribution of 360-Degree component scores — Data Quality Issue 2",col.names =c("Category", "Count", "% of All Staff"))
Table 3. Distribution of 360-Degree component scores — Data Quality Issue 2
Outliers at the lower end dominate, particularly in the 360-Degree component where most values are zero. These records are retained rather than removed: they represent genuine cases of non-participation or low assessment, which are substantively important for HR decision-making.
The distribution is left-skewed: most staff cluster between 75 and 90, with a long lower tail. The mean (80.7) sits below the median (81.7), confirming the skew. This asymmetry favours non-parametric statistical tests and means regression residuals may not be perfectly normal — both are addressed in the relevant sections below.
5. Data Visualisation
Technique 2 — Book Reference: Chapter 5 (Grammar of graphics, chart selection, storytelling with data)
The five plots below collectively tell a single story: performance at KWIRS is structurally unequal — by grade, by location, and across time — and the gaps have not narrowed over the four-year period covered by this data.
Support (Drivers) show a notably higher median than both Revenue Staff and Management, while Management and Revenue Staff are visually similar — a pattern confirmed formally in Section 8. Other Support has a very wide spread, driven by a small group (n = 8).
The three Zonal Offices (Kwara Central, North, South) cluster near the bottom of the directorate ranking, while most HQ Directorates sit at or above the organisation-wide mean — motivating the Zonal-vs-HQ hypothesis test in Section 6.
Figure 4 — Performance Trend 2022–2025 by Office Type
The HQ–Zonal performance gap is persistent across all four years, not a recent development. Neither group shows a meaningful upward trend, suggesting the current performance management system has not yet produced sustained score improvement at the organisational level.
Figure 5 — 2025 Score Distribution: HQ vs Zonal
Code
#| label: plot-violinggplot(df, aes(x = office_type, y = perf_2025, fill = office_type)) +geom_violin(trim =FALSE, alpha =0.6) +geom_boxplot(width =0.12, alpha =0.9, outlier.shape =NA) +stat_summary(fun = mean, geom ="point", shape =18,size =4, colour ="white") +scale_fill_manual(values =c("HQ Directorate"="#2c7bb6","Zonal Office"="#d7191c"),guide ="none") +labs(title ="Figure 5. 2025 Score Distribution by Office Type",subtitle ="White diamond = group mean; box = IQR; violin = full density",x =NULL,y ="2025 Performance Score (out of 100)" ) +theme_minimal(base_size =13)
The HQ distribution is more symmetrically concentrated in the upper range. The Zonal distribution has a heavier lower tail, confirming that low-performing staff are disproportionately located in the field offices.
There is a positive but modest relationship between 2024 and 2025 scores. The HQ cluster sits systematically above the Zonal cluster at equivalent prior-year scores, consistent with a location-based effect independent of individual capability — confirmed formally in Section 8.
6.1 Test 1 — Does 2025 Performance Differ by Grade Band?
Business context: Before designing grade-differentiated training programmes, management needs statistical confirmation that performance differences across grades are real and not attributable to sampling variation.
Hypotheses:
H₀: The distribution of 2025 performance scores is identical across all four grade bands.
H₁: At least one grade band has a different distribution of 2025 performance scores.
Assumption check — normality:
Code
#| label: normality-checkset.seed(3847)df |>filter(!is.na(perf_2025)) |>group_by(grade_band) |>summarise(n =n(),W =round(shapiro.test(perf_2025)$statistic, 4),p_value =round(shapiro.test(perf_2025)$p.value, 4),Normal =if_else(shapiro.test(perf_2025)$p.value >0.05, "Yes", "No"),.groups ="drop" ) |>kable(caption ="Table 5. Shapiro-Wilk normality test per grade band (n ≤ 50 per group)",col.names =c("Grade Band", "n", "W Statistic", "p-value", "Approx. Normal?"))
Table 5. Shapiro-Wilk normality test per grade band (n ≤ 50 per group)
Grade Band
n
W Statistic
p-value
Approx. Normal?
Revenue Staff
696
0.9243
0.0000
No
Management
114
0.9425
0.0001
No
Support (Drivers)
34
0.9610
0.2597
Yes
Other Support
8
0.7942
0.0248
No
Normality is rejected for several groups (p < 0.05), consistent with the left-skewed distribution observed in Section 4. A non-parametric Kruskal-Wallis test is therefore appropriate.
Kruskal-Wallis Rank-Sum Test:
Code
#| label: kruskal-testkw_result <-kruskal.test(perf_2025 ~ grade_band, data = df)kw_result
Kruskal-Wallis rank sum test
data: perf_2025 by grade_band
Kruskal-Wallis chi-squared = 34, df = 3, p-value = 0.0000002
Result and interpretation: The Kruskal-Wallis test is highly significant (H(3) = 33.826, p < 0.001). We reject H₀. The effect size ε² = 0.0364 indicates a small effect (ε² < 0.06). Post-hoc comparisons show that Support (Drivers) score significantly higher than both Revenue Staff and Management (adjusted p < 0.001), while Management and Revenue Staff do not differ significantly from each other.
Business implication: The performance ranking difference across grades is real, but the effect size is small — most of the variance in scores is shared across grades rather than explained by grade level. Notably, management-grade staff do not significantly outperform Revenue Staff, and Support (Drivers) unexpectedly lead, suggesting that HOD scoring standards vary across staff categories rather than reflecting systematic capability differences. A calibration workshop for HODs is warranted.
6.2 Test 2 — Does 2025 Performance Differ Between HQ and Zonal Staff?
Business context: KWIRS’s three Zonal Offices serve geographically dispersed operations with potentially different resourcing and supervisory quality compared to HQ. Establishing whether the HQ–Zonal performance gap is statistically significant is essential before recommending targeted field-office interventions.
Hypotheses:
H₀: Mean 2025 performance score is the same for HQ and Zonal staff.
H₁: HQ and Zonal staff differ in mean 2025 performance score.
Both groups are large (n > 30), so by the Central Limit Theorem the sampling distribution of the mean is approximately normal. A Welch t-test (which does not assume equal variances) is used.
Welch Two Sample t-test
data: perf_2025 by office_type
t = 3.1, df = 271, p-value = 0.002
alternative hypothesis: true difference in means between group HQ Directorate and group Zonal Office is not equal to 0
95 percent confidence interval:
0.7079 3.2144
sample estimates:
mean in group HQ Directorate mean in group Zonal Office
81.13 79.17
Cohen's d: 0.296
Interpretation: small-to-medium effect (threshold small = 0.2, medium = 0.5)
Result and interpretation: The Welch t-test is significant (t(270.8) = 3.081, p = 0.0023). The 95% confidence interval for the mean difference is [0.71, 3.21] points, with HQ staff scoring on average 1.96 points higher. Cohen’s d = 0.296 indicates a small-to-medium effect size.
Business implication: We reject H₀. The HQ–Zonal gap is statistically real. As Section 8’s regression will show, this gap persists even after controlling for grade band and prior-year performance, pointing to a structural rather than individual explanation. KWIRS should investigate whether Zonal offices lack adequate training infrastructure, supervisory oversight, or exam preparation resources compared to HQ.
7. Correlation Analysis
Technique 4 — Book Reference: Chapter 8 (Pearson, Spearman, partial correlation, correlation vs causation)
Controlling for 2023 performance, the partial correlation between 2022 and 2025 performance tests whether the earliest year retains any predictive signal once more recent history is accounted for.
Table 9. Partial correlation: 2022 vs 2025 performance, controlling for 2023
Test
r_partial
t_stat
p_value
n
2022 Perf ↔︎ 2025 Perf | 2023 Perf
0.1009
2.758
0.006
743
7.4 Discussion of Key Correlations
1. The surprisingly weak year-to-year performance consistency. The bivariate correlations between adjacent assessment years are modest at best: r = 0.39 (2022–2023), r = 0.17 (2023–2024), and r = 0.23 (2024–2025). In a well-calibrated appraisal system one would typically expect adjacent-year correlations above 0.5, since individual capability is relatively stable. These low values suggest that performance rankings at KWIRS change substantially from year to year — more than genuine capability shifts would warrant. The most plausible explanations are inconsistent HOD scoring standards across cycles, changes in assessment weighting between years, or rater fatigue. This is the most important quality signal in the dataset.
2. The 360-Degree anomaly. The 360-Degree component is negatively correlated with both Quantitative (r = −0.35) and Exam (r = −0.39) scores. Because approximately 90% of 360 scores are zero, this negative correlation almost certainly reflects a selection effect: staff who did receive a 360 review (non-zero scores) have a different performance profile from those who did not — not a genuine inverse relationship between objective assessment and peer feedback. This further underscores that the 360-degree instrument is currently non-functional for most of the workforce.
3. Quantitative component vs 2025 total (r = 0.86). This very high correlation is expected and arises by construction: perf_2025 = quant + qual + exam + deg360, and Quantitative carries up to 60 of the 100 available points. The practical implication is that the 2025 overall score is almost entirely determined by the Quantitative sub-score. Any effort to improve aggregate performance must focus on the Quantitative component specifically.
Partial correlation note: After controlling for 2023 performance, the partial correlation between 2022 and 2025 drops to r = 0.101 (p = 0.006). The 2022 performance year retains a small but statistically significant relationship with 2025 even after accounting for the 2023 cycle, suggesting a mild long-memory component in individual performance trajectories.
8.1 Model Specification and Business Justification
Business question: After controlling for an individual’s grade level and office location, how strongly does prior-year performance predict 2025 performance? A significant coefficient on 2024 performance confirms predictive validity in the appraisal system; significant grade or location coefficients quantify structural penalties or premiums.
Why prior-year scores rather than 2025 sub-components as predictors?perf_2025 is the arithmetic sum of quant + qual + exam + deg360. Regressing a variable on its own components yields R² ≈ 1.0 and coefficients ≈ 1.0 by algebraic identity — not an empirical finding. The model below uses variables that are not components of the outcome, producing genuinely informative coefficients.
Residuals vs Fitted: Roughly centred on zero with no strong non-linear pattern, supporting linearity. A slight heteroscedastic tendency at low fitted values is consistent with the left-skewed score distribution.
Normal Q-Q: Central residuals align well with the theoretical normal line; the lower tail deviates slightly. With n = 852, OLS estimates are robust to mild non-normality.
Scale-Location: No dramatic fanning; heteroscedasticity is minor and unlikely to materially bias inference.
Cook’s Distance: No observation exceeds the 4/n threshold by a wide margin, confirming the absence of highly influential records.
Table 12. Variance Inflation Factors — all values < 1.1 confirm no multicollinearity
Predictor
VIF
Assessment
perf_2024
1.037
Acceptable (< 5)
grade_bandManagement
1.065
Acceptable (< 5)
grade_bandSupport (Drivers)
1.027
Acceptable (< 5)
grade_bandOther Support
1.007
Acceptable (< 5)
office_typeZonal Office
1.041
Acceptable (< 5)
All VIF values are below 1.1 — well within the acceptable threshold of 5. The predictors contribute independent information.
8.5 Business Interpretation of Significant Coefficients
The model explains 9.8% of variance in 2025 scores (Adj. R² = 0.093). While statistically significant as a whole (F-test p < 0.001), the low R² is itself an important finding — discussed below.
2024 Performance | 0.233 | *** | Each additional point in 2024 predicts only 0.23 points more in 2025. A 10-point 2024 improvement predicts just a 2.3-point 2025 gain. Action: While significant, this low coefficient — combined with the weak bivariate correlation of r = 0.23 — suggests the appraisal system has low predictive validity. Historical performance is a poor guide to future performance, raising concerns about HOD scoring consistency. |
Management Grade | -0.643 | (n.s.) | Once prior-year performance is controlled, Management grades do NOT score significantly higher than Revenue Staff. The visual impression of management outperformance (Section 5) is driven by management staff also having higher 2024 scores — not a grade-specific 2025 advantage. Action: Management appraisal criteria should be reviewed to ensure they are substantively more demanding, not simply applied more generously. |
Support (Drivers) Grade | 4.572 | *** | Driver-grade staff score approximately 4.6 points higher than Revenue Staff after controlling for prior performance and location. This is the most unexpected finding. Possible explanations include more lenient HOD scoring for smaller support teams or different weighting of assessment criteria for non-revenue roles. Action: KWIRS should investigate whether appraisal criteria and HOD calibration for support grades are equivalent to those applied to revenue-generating staff. |
Zonal Office | -1.939 | *** | Controlling for grade and prior performance, Zonal staff score approximately 1.9 points lower than HQ staff. This location penalty is independent of individual capability. Action: This is the most actionable finding. KWIRS should audit training infrastructure, examination facilities, and supervisory resource quality across all three Zonal Offices and implement a structured equalisation programme. |
The low R² (≈ 10%) is itself a key finding. Only one-tenth of the variance in 2025 scores is explained by prior-year performance, grade, and location combined. The remaining 90% reflects factors not captured in this model — likely a combination of (a) genuinely unmeasured determinants (training hours, tenure, HOD relationship quality) and (b) measurement error from inconsistent scoring. This reinforces the recommendation to commission a comprehensive appraisal instrument calibration review.
9. Integrated Findings
The five analytical techniques applied in this study converge on a coherent story about both organisational inequality and measurement quality concerns at KWIRS.
EDA established a well-populated dataset with two solvable quality issues, the more serious being a near-defunct 360-degree component that currently penalises approximately 90% of staff for non-participation in a process they were not required to complete. Visualisation confirmed that the HQ–Zonal performance gap is not recent — it has persisted across all four years without narrowing, suggesting structural rather than transient causes.
Hypothesis testing provided statistical confirmation: both the grade-band differentiation (small effect, ε² = 0.0364) and the HQ–Zonal gap (small-to-medium effect, d = 0.296) are significant beyond the 0.001 threshold. The post-hoc results, however, revealed an unexpected finding: Support (Drivers) grades outperform Revenue Staff significantly, not Management grades — raising questions about scoring standard consistency across staff categories.
Correlation analysis produced the most diagnostically important finding: year-to-year performance correlations at KWIRS are weak (r = 0.13–0.39), far lower than a reliable appraisal system would produce. Combined with the low regression R² of approximately 10%, this points to substantial unexplained variance — consistent with HOD scoring inconsistency across cycles as a root cause.
Single integrated recommendation: KWIRS should immediately commission an Appraisal System Integrity Review with three concrete outputs: (1) mandatory 360-degree review completion tracked as an HOD KPI, eliminating the current near-universal zero-score problem; (2) inter-HOD calibration workshops to align scoring standards across grades and offices, with the goal of raising year-to-year correlations above 0.5; and (3) a Zonal Office Capability Programme providing parity of training, examination infrastructure, and supervisory support relative to HQ — directly addressing the persistent ~2-point location penalty confirmed by regression.
10. Limitations and Further Work
Census completeness: Staff who resigned, were dismissed, or were on long-term leave during the period are absent, introducing survivorship bias toward more stable, better-performing employees.
Appraisal validity: All analyses assume scores reflect actual performance. If HODs apply inconsistent standards across directorates — plausible given the weak year-to-year correlations — the data measures assessed rather than actual performance.
360-degree ambiguity: Zero scores were retained as-is; without confirmation from HR that they represent non-participation (not genuine zero assessments), any 360-related interpretation should be treated with caution.
Omitted variables: The regression excludes potentially important predictors — years of service, educational qualification, training hours attended, and HOD tenure — that could substantially increase R² and change coefficient estimates.
Causation vs association: Regression coefficients are associational, not causal. The Zonal Office penalty cannot be interpreted as a pure location effect without controlling for all confounders.
Further work: (1) Enrich the model with tenure, education, and training data. (2) Apply logistic regression to predict a binary “at risk” outcome (score < 65) for proactive HR targeting. (3) Conduct a propensity-score matched comparison of HQ and Zonal staff with equivalent grades and prior scores to isolate the causal location effect. (4) Apply text analytics (sentiment analysis, topic modelling) to the HOD comment and Areas-for-Improvement fields to complement the quantitative findings with qualitative insight from the appraisal narratives.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
R Core Team. (2025). R: A language and environment for statistical computing (Version 4.5). R Foundation for Statistical Computing. https://www.R-project.org/
Robinson, D., Hayes, A., & Couch, S. (2024). broom: Convert statistical objects into tidy tibbles (Version 1.0.6) [R package]. https://CRAN.R-project.org/package=broom
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Zhu, H. (2024). knitr: A general-purpose package for dynamic report generation in R (Version 1.45) [R package]. https://CRAN.R-project.org/package=knitr
Appendix: AI Usage Statement
Posit Assistant (an AI coding assistant embedded in RStudio) was used to assist in writing and debugging R code for this analysis, including the data-cleaning pipeline, ggplot2 visualisation syntax, manual implementations of effect-size formulas (epsilon-squared, Cohen’s d, partial correlation), and the regression diagnostic plot layout. All analytical decisions were made independently by the author: the selection of Case Study 1, the choice of a non-parametric Kruskal-Wallis test based on the normality check, the decision to use prior-year scores rather than 2025 component scores as regression predictors (to avoid the arithmetic identity problem), the identification of the 360-degree zero-score ambiguity as a data quality concern, and all business interpretations including the unexpected Support (Drivers) finding. The narrative text and all substantive conclusions represent the author’s own professional judgement grounded in operational knowledge of KWIRS performance management processes.
Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.