Drivers of Employee Performance at Argentil Group

Individual, Organisational, and Developmental Factors — 2023–2025 Biannual Reviews

Author

Tolu Bashorun, Associate Vice President, People & Culture

Published

May 12, 2026

1 Executive Summary

Employee performance sits at the centre of Argentil Group’s competitive position. Across our investment banking, principal investing, and asset management lines, a lean and specialised workforce means that every individual outcome compounds — making it essential to know which factors actually move the performance dial and which do not.

This study uses a longitudinal panel of 31 employees observed across six half-yearly review periods (2023 H1 – 2025 H2; 153 person-period records), drawn from the Human Performance Committee (HPC) Power BI report and supplemented by HRIS records. Five complementary techniques — Exploratory Data Analysis, Visualisation, Hypothesis Testing, Correlation Analysis, and Ordinary Least Squares Regression — are applied in sequence, supplemented by a manager-level variance analysis and an employee fixed-effects robustness check.

Four findings dominate. Prior performance is the single strongest driver of current scores (r = +0.55, p < 0.001), confirming substantial within-employee consistency. Tenure is positively associated with performance (r = +0.38, p < 0.001) and holds in the multivariable regression (β = +0.32 per year, p = 0.045). Department matters at the aggregate level (ANOVA F = 8.16, p < 0.001): CS records the highest mean (88.8) and Finance the lowest (79.8). Manager-level variation is substantial and significant (ANOVA F = 4.94, p < 0.001) — between-team differences are not just statistical noise. Training participation intensity and within-period promotion status show no positive independent association with performance; promotion, in fact, shows a negative bivariate association (t = –2.35, p = 0.025), consistent with a learning-curve effect in new roles. The reduced regression model explains 37% of variance (adj-R² = 0.347). The recommendation is to prioritise early-career continuity and to investigate the Finance-team performance gap and the manager-level dispersion as the highest-leverage next actions.

2 Professional Disclosure

Tolu Bashorun is the Associate Vice President, People & Culture at Argentil Group, with responsibility for human resources practice across the Group’s Nigerian operations spanning investment banking, principal investing, and asset management. The remit covers performance management, learning and development, talent management, compensation and benefits, and employee relations and engagement — meaning people data is a live operational input into decisions made regularly across the employee lifecycle.

Exploratory Data Analysis (EDA) sets the foundation. In a corporate HR setting, performance data is rarely clean or self-explanatory. As custodian of the Group’s people records, the first responsibility is to understand the data before drawing conclusions: examining distributions, identifying anomalies, and surfacing structural gaps. In this study EDA confirmed that 17 first-observation rows had no prior_performance (employees in probation, structurally rather than randomly missing) and that the five departments are unevenly represented — facts that shape every later interpretation.

Data Visualisation is the bridge between numerical results and executive understanding. Communicating findings to the Human Performance Committee and the board is part of the AVP role, and charts that prioritise clarity over technical detail are the way that happens. The visualisations in this report are designed to make each result legible to a non-technical reader in seconds.

Hypothesis Testing brings discipline to claims that would otherwise rest on intuition. In a firm of this size where teams are lean and decisions deliberate, statements such as “the training programme is working” or “Investment Banking outperforms” need a formal evidence base. Welch’s t-tests and one-way ANOVA provide that base here.

Correlation Analysis is the standard first quantitative pass — a fast, assumption-light way to triage which variables earn a place in the regression model. For development and succession planning at Argentil, knowing which factors actually move alongside performance prevents resources from being spent on interventions that are not connected to outcomes.

Ordinary Least Squares Regression completes the chain by isolating the partial effect of each driver while holding the others constant. It is the technique that converts correlation into a defensible statement about which lever does what — exactly the question that strategic people decisions at Argentil hinge on.

3 Data Collection & Sampling

Primary source. The Power BI performance appraisal report prepared each cycle for the Human Performance Committee (HPC). This is the authoritative record of the bi-annual review and the source of the performance, prior performance, and performance change variables.

Supplementary source. The Human Resources Information System (HRIS) and general HR records, which supplied the time and role variables (hire year, tenure, years-in-role, job level, department), the manager identifier, the promotion status, and the training participation intensity. Both sources were accessed in the ordinary course of professional duties as data custodian.

Time period covered. Six review periods: 2023 H1, 2023 H2, 2024 H1, 2024 H2, 2025 H1, and 2025 H2. The biannual review cycle is the standard cadence at Argentil.

Sampling approach. A census — every employee with complete records in any of the six periods is included; no random sampling is applied. The file is a population snapshot of the relevant workforce, not a sample from it.

Sample size. 31 unique employees, 153 person-period observations. Most employees (19 of 31) appear in all six periods; the remainder are joiners or leavers within the window. Observations per employee range from 1 to 6 with median 6.

Unit of observation. Employee-period pair — the design that allows performance to be examined across both employees and time.

Ethical considerations. All employee and manager identifiers were anonymised before analysis (e.g., EMP001). No personal identifiable information is included in the dataset or outputs, and the data is used solely for the academic purposes of this submission, consistent with the data governance responsibilities of the role.

4 Data Description

The dataset comprises 153 observations across 15 variables, covering six biannual review periods between 2023 and 2025.

Show code

tibble(
  Variable   = names(df),
  Type       = sapply(df, function(x) class(x)[1]),
  `Non-null` = sapply(df, function(x) sum(!is.na(x))),
  Missing    = sapply(df, function(x) sum(is.na(x))),
  Unique     = sapply(df, function(x) length(unique(x)))
) |>
  kable()

Table 1: Variable inventory — types, completeness, and cardinality

Variable	Type	Non-null	Missing	Unique
S/N	numeric	153	0	153
employee_id	character	153	0	31
hire_year	numeric	153	0	10
year	numeric	153	0	3
period	character	153	0	2
department	character	153	0	5
level	character	153	0	13
tenure_years	numeric	153	0	17
years_in_role	numeric	153	0	49
manager_id	character	153	0	12
promotion_status	numeric	153	0	2
training_participation_intensity	numeric	153	0	2
prior_performance	numeric	139	14	102
performance_score	numeric	153	0	107
performance_change	numeric	139	14	108
period_idx	numeric	153	0	6
period_label	factor	153	0	6
training_label	factor	153	0	2
promo_label	factor	153	0	2

The variables fall into four conceptual blocks. Identifiers and time — S/N, employee_id, manager_id, hire_year, year, period. Individual factors — tenure_years, years_in_role, prior_performance. Organisational factors — department (CS, IB, PIPE, Finance, and People & Culture), level. Developmental factors — promotion_status (binary) and training_participation_intensity. Outcomes — performance_score (range 56–100) and performance_change.

Show code

df |>
  select(tenure_years, years_in_role, prior_performance,
         performance_score, performance_change) |>
  summary() |>
  kable()

Table 2: Summary statistics — numeric variables

tenure_years	years_in_role	prior_performance	performance_score	performance_change
Min. : 0.300	Min. : 0.100	Min. : 56.00	Min. : 56.00	Min. :-44.0000
1st Qu.: 1.000	1st Qu.: 0.800	1st Qu.: 83.03	1st Qu.: 83.12	1st Qu.: -0.9100
Median : 3.000	Median : 1.500	Median : 86.33	Median : 86.35	Median : 0.1600
Mean : 4.319	Mean : 3.473	Mean : 86.57	Mean : 86.57	Mean : 0.2996
3rd Qu.: 8.000	3rd Qu.: 3.500	3rd Qu.: 90.22	3rd Qu.: 90.00	3rd Qu.: 2.0000
Max. :12.000	Max. :12.600	Max. :100.00	Max. :100.00	Max. : 34.0000
NA	NA	NA’s :14	NA	NA’s :14

Show code

mean_v   <- mean(df$performance_score, na.rm = TRUE)

# Histogram with hover tooltip
hist_data <- df |>
  mutate(bin = cut(performance_score, breaks = 20)) |>
  count(bin, name = "Count") |>
  mutate(
    bin_mid = sapply(strsplit(gsub("\\(|\\]|\\[", "", as.character(bin)), ","),
                     function(x) mean(as.numeric(x))),
    bin_lbl = as.character(bin)
  )

p_hist <- ggplot(hist_data, aes(x = bin_mid, y = Count,
                                 text = paste0("Score range: ", bin_lbl,
                                               "<br>Count: ", Count))) +
  geom_col(fill = PAL$primary, color = "white", alpha = 0.88, width = 2) +
  geom_vline(xintercept = mean_v, color = PAL$accent,
             linetype = "dashed", linewidth = 0.9) +
  labs(title = "Distribution of Performance Score",
       x = "Performance Score", y = "Frequency")

# Q-Q plot
qq_data <- tibble(sample = sort(df$performance_score[!is.na(df$performance_score)])) |>
  mutate(theoretical = qnorm(ppoints(n())))

p_qq <- ggplot(qq_data, aes(x = theoretical, y = sample,
                             text = paste0("Theoretical: ", round(theoretical, 2),
                                           "<br>Sample: ", round(sample, 2)))) +
  geom_point(color = PAL$primary, alpha = 0.7, size = 1.8) +
  geom_abline(slope = sd(qq_data$sample), intercept = mean(qq_data$sample),
              color = PAL$accent, linewidth = 0.9) +
  labs(title = "Q-Q Plot vs. Normal",
       x = "Theoretical Quantiles", y = "Sample Quantiles")

plotly::subplot(
  make_interactive(p_hist, tooltip = "text"),
  make_interactive(p_qq,   tooltip = "text"),
  nrows = 1, margin = 0.06, titleX = TRUE, titleY = TRUE
)

Figure 1: Distribution of performance score and Q-Q plot against the normal distribution. Scores are approximately bell-shaped around a mean of 86.6 with a mild left tail; acceptable for parametric inference at n = 153. Hover over bars and points for details.

Performance scores are concentrated in a high band — mean ≈ 86.6, SD ≈ 6.3 — consistent with Argentil’s strong performance culture but limiting the variability available for statistical inference. Seventeen missing values appear on prior_performance and performance_change; these correspond to first-observation rows for new joiners still in their probation phase and are therefore retained for descriptive analysis but excluded only where prior performance is required as a predictor.

5 Exploratory Data Analysis

Theory recap. EDA, formalised by Tukey (1977) and elaborated for business analytics applications by Adi (2026), is the systematic look at the data before any model is fitted: distributional shape, outliers, missingness pattern, and bivariate associations. It is the diagnostic stage that protects every subsequent inferential claim.

Business justification. Before recommending changes to the training budget, promotion criteria, or developmental programmes at the Group, the analyst has to be sure the data can support those claims. EDA reveals the unbalanced department sizes, the structural missingness for probation-period rows, and the level-band imbalance — facts that change how every later result must be qualified.

Show code

cat_summary <- bind_rows(lapply(
  c("department","level","period","promotion_status","training_participation_intensity"),
  function(col) {
    df |>
      count(.data[[col]], name = "n") |>
      arrange(desc(n)) |>
      transmute(
        Variable = col,
        Value    = as.character(.data[[col]]),
        n        = n,
        Pct      = sprintf("%.1f%%", 100 * n / nrow(df))
      )
  }
))
kable(cat_summary)

Table 3: Distribution of categorical variables

Variable	Value	n	Pct
department	CS	58	37.9%
department	IB	43	28.1%
department	PIPE	28	18.3%
department	Finance	12	7.8%
department	People & Culture	12	7.8%
level	Executive Driver	29	19.0%
level	Principal Associate	19	12.4%
level	Senior Associate	14	9.2%
level	Analyst II	13	8.5%
level	Analyst III	13	8.5%
level	Administrative Assistant	12	7.8%
level	Analyst I	11	7.2%
level	Associate	9	5.9%
level	Senior Vice President	9	5.9%
level	Graduate Trainee	8	5.2%
level	Associate Vice President	7	4.6%
level	Pool Driver	6	3.9%
level	Vice President	3	2.0%
period	H2	81	52.9%
period	H1	72	47.1%
promotion_status	0	134	87.6%
promotion_status	1	19	12.4%
training_participation_intensity	1	115	75.2%
training_participation_intensity	2	38	24.8%

Show code

dep_means <- df |>
  group_by(department) |>
  summarise(mean_score = mean(performance_score, na.rm = TRUE),
            n = n(), .groups = "drop") |>
  arrange(mean_score)

dep_order <- dep_means$department
dep_cols  <- c(CS = PAL$CS, IB = PAL$IB, PIPE = PAL$PIPE, Finance = PAL$Finance)

p_dep <- ggplot(df, aes(x = factor(department, levels = dep_order),
                         y = performance_score,
                         fill = department,
                         text = paste0("<b>", department, "</b>",
                                       "<br>Score: ", round(performance_score, 2),
                                       "<br>Tenure: ", tenure_years, " yrs",
                                       "<br>Level: ", level))) +
  geom_boxplot(width = 0.55, alpha = 0.75, outlier.shape = NA) +
  geom_jitter(width = 0.18, alpha = 0.55, size = 1.3, color = "#404040") +
  scale_fill_manual(values = dep_cols, guide = "none") +
  labs(title = paste0("Performance Score by Department  ·  Overall mean ",
                      round(mean(df$performance_score, na.rm = TRUE), 1)),
       x = "Department (ordered low → high mean)",
       y = "Performance Score")

make_interactive(p_dep, tooltip = "text")

Figure 2: Performance score by department, ordered by mean. CS shows the highest median; Finance sits markedly below the other three. Hover any box for distribution stats; toggle departments by clicking the legend.

Show code

period_means <- df |>
  group_by(period_idx, period_label) |>
  summarise(
    mean_score = mean(performance_score, na.rm = TRUE),
    sd_score   = sd(performance_score,   na.rm = TRUE),
    n          = n(),
    .groups    = "drop"
  ) |>
  mutate(se = sd_score / sqrt(n),
         ymin = mean_score - se,
         ymax = mean_score + se)

period_lbls <- c("2023 H1","2023 H2","2024 H1","2024 H2","2025 H1","2025 H2")

p_time <- ggplot() +
  geom_line(data = df,
            aes(x = period_idx, y = performance_score, group = employee_id,
                text = paste0("<b>", employee_id, "</b>",
                              "<br>", department, " · ", level,
                              "<br>", period_label, ": ", round(performance_score, 2))),
            color = "#9CA3AF", alpha = 0.30, linewidth = 0.5) +
  geom_point(data = df,
             aes(x = period_idx, y = performance_score,
                 text = paste0("<b>", employee_id, "</b>",
                               "<br>", department, " · ", level,
                               "<br>", period_label, ": ", round(performance_score, 2))),
             color = "#9CA3AF", alpha = 0.5, size = 1.4) +
  geom_ribbon(data = period_means,
              aes(x = period_idx, ymin = ymin, ymax = ymax),
              fill = PAL$accent, alpha = 0.18) +
  geom_line(data = period_means,
            aes(x = period_idx, y = mean_score,
                text = paste0("<b>", period_label, "</b>",
                              "<br>Cohort mean: ", round(mean_score, 2),
                              "<br>± 1 SE: [", round(ymin, 2), ", ", round(ymax, 2), "]",
                              "<br>n = ", n)),
            color = PAL$accent, linewidth = 1.2, group = 1) +
  geom_point(data = period_means,
             aes(x = period_idx, y = mean_score,
                 text = paste0("<b>", period_label, "</b>",
                               "<br>Cohort mean: ", round(mean_score, 2),
                               "<br>n = ", n)),
             color = PAL$accent, size = 3.5) +
  scale_x_continuous(breaks = 1:6, labels = period_lbls) +
  labs(title = "Performance Trajectory Over Six Half-Yearly Reviews",
       subtitle = "Individual employees (grey) + cohort mean ± 1 SE (red)",
       x = "Review Period", y = "Performance Score")

make_interactive(p_time, tooltip = "text")

Figure 3: Performance trajectory across six review periods. Grey lines trace individual employees; the red line marks the cohort mean ± 1 SE. Hover any employee line to see their full profile; click legend items to filter.

Show code

mgr_means <- df |>
  group_by(manager_id) |>
  summarise(mean_score = mean(performance_score, na.rm = TRUE),
            n = n(), .groups = "drop") |>
  arrange(mean_score)
mgr_order <- mgr_means$manager_id

firm_mean <- mean(df$performance_score, na.rm = TRUE)

p_mgr <- ggplot(df, aes(x = factor(manager_id, levels = mgr_order),
                         y = performance_score,
                         text = paste0("<b>Manager: ", manager_id, "</b>",
                                       "<br>Employee: ", employee_id,
                                       "<br>Score: ", round(performance_score, 2),
                                       "<br>Department: ", department))) +
  geom_boxplot(width = 0.6, alpha = 0.75, fill = PAL$primary,
               outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.55, color = "#404040", size = 1.3) +
  geom_hline(yintercept = firm_mean, color = PAL$accent,
             linetype = "dashed", linewidth = 0.7) +
  labs(title = "Performance by Manager — Between-Team Variation",
       subtitle = sprintf("Firm mean = %.1f (dashed red); team means range %.1f to %.1f",
                          firm_mean, min(mgr_means$mean_score), max(mgr_means$mean_score)),
       x = "Manager (ordered by team mean, low → high)",
       y = "Performance Score") +
  theme(axis.text.x = element_text(angle = 35, hjust = 1, size = 8.5))

make_interactive(p_mgr, tooltip = "text")

Figure 4: Performance score by manager, ordered by team mean. Twelve managers show substantial dispersion — from EMP034 (team mean 79.8) to EMP032 (team mean 93.5). Hover boxes for team statistics. The manager effect is tested formally in the hypothesis section.

Plain-language interpretation. The data are usable but unbalanced: CS supplies roughly 38% of all observations while Finance and People & Culture each have just 12 rows. Performance scores cluster between 82 and 92 with a small left tail. The six-period trajectory shows the cohort average barely moves between 2023 H1 and 2025 H2, but individual employees swing by several points between periods. The manager view is the most striking single chart in the report: team means range from ~80 to ~93 — a gap of more than two standard deviations of the outcome variable. Some of this is genuine performance heterogeneity; some is plausibly rater drift. Either way, it warrants a calibration conversation.

6 Visualisation: Bivariate Patterns

Theory recap. Visualisation translates statistical relationships into shapes a non-statistician can read in seconds (Cleveland 1985; Wickham 2016). Scatter plots, boxplots, and heat maps each have a job: scatter for continuous-by-continuous patterns, boxplots for continuous-by-categorical comparisons, and heat maps for the multivariable correlation structure.

Business justification. When the Human Performance Committee sees a heat map showing prior performance and tenure as the warm cells while training intensity is near zero, the takeaway is immediate. The same finding in a coefficient table would not survive the first board slide.

Show code

num_cols_corr <- c("tenure_years","years_in_role","prior_performance",
                   "performance_score","performance_change",
                   "training_participation_intensity","promotion_status")
corr_mat <- cor(df[, num_cols_corr], use = "pairwise.complete.obs")

# Keep lower triangle; upper as NA for cleaner look
z <- corr_mat
z[upper.tri(z)] <- NA
text_labels <- ifelse(is.na(z), "", sprintf("%.2f", z))

plotly::plot_ly(
  x = num_cols_corr,
  y = num_cols_corr,
  z = z,
  type = "heatmap",
  colorscale = list(
    list(0,   "#C0504D"),
    list(0.5, "white"),
    list(1,   "#2E7D8F")
  ),
  zmid = 0, zmin = -1, zmax = 1,
  text = text_labels,
  texttemplate = "%{text}",
  textfont = list(size = 12, color = "#1F2937"),
  xgap = 2, ygap = 2,
  colorbar = list(title = "Pearson r", thickness = 14, len = 0.7,
                  tickvals = c(-1, -0.5, 0, 0.5, 1)),
  hovertemplate = "<b>%{y}</b> ↔ <b>%{x}</b><br>r = %{z:.3f}<extra></extra>"
) |>
  plotly::layout(
    title = list(text = "<b>Correlation Matrix of Numeric Variables</b>",
                 x = 0.5, font = list(size = 14)),
    xaxis = list(tickangle = -35, side = "bottom", showgrid = FALSE,
                 tickfont = list(size = 11)),
    yaxis = list(autorange = "reversed", showgrid = FALSE,
                 tickfont = list(size = 11)),
    paper_bgcolor = "white", plot_bgcolor = "white",
    margin = list(l = 200, r = 40, t = 60, b = 130)
  ) |>
  plotly::config(displayModeBar = "hover",
                 modeBarButtonsToRemove = c("lasso2d","select2d","autoScale2d"))

Figure 5: Correlation matrix across numeric variables. Warm cells indicate positive correlations, cool cells negative. Hover any cell for the correlation coefficient.

Show code

p_train <- ggplot(df, aes(x = training_label, y = performance_score, fill = training_label,
                           text = paste0("<b>", training_label, " training</b>",
                                         "<br>Employee: ", employee_id,
                                         "<br>", department, " · ", level,
                                         "<br>Score: ", round(performance_score, 2)))) +
  geom_boxplot(width = 0.45, alpha = 0.75, outlier.shape = NA) +
  geom_jitter(width = 0.18, alpha = 0.55, color = "#404040", size = 1.3) +
  scale_fill_manual(values = c("Low (1)" = PAL$low, "High (2)" = PAL$high),
                    guide = "none") +
  labs(title = "Performance by Training Intensity",
       x = "Training Participation Intensity", y = "Performance Score")

p_promo <- ggplot(df, aes(x = promo_label, y = performance_score, fill = promo_label,
                           text = paste0("<b>", promo_label, "</b>",
                                         "<br>Employee: ", employee_id,
                                         "<br>", department, " · ", level,
                                         "<br>Score: ", round(performance_score, 2)))) +
  geom_boxplot(width = 0.45, alpha = 0.75, outlier.shape = NA) +
  geom_jitter(width = 0.18, alpha = 0.55, color = "#404040", size = 1.3) +
  scale_fill_manual(values = c("Not promoted" = PAL$promo_no, "Promoted" = PAL$promo_yes),
                    guide = "none") +
  labs(title = "Performance by Promotion Status",
       x = "Promotion Status", y = "Performance Score")

plotly::subplot(
  make_interactive(p_train, tooltip = "text"),
  make_interactive(p_promo, tooltip = "text"),
  nrows = 1, margin = 0.06, titleX = TRUE, titleY = TRUE
)

Figure 6: Performance by developmental factors. Training intensity shows no visible lift between low and high. Recently promoted employees sit modestly below the non-promoted group — likely a learning-curve effect that the regression analysis investigates further. Hover any box or point for full detail.

Plain-language interpretation. The heat map confirms two intuitions and refutes one. The confirmations: a person’s score this period mostly resembles their score last period (r = 0.55), and tenure tracks performance gently upward (r = 0.38). The refutation: training intensity does not move with performance (r ≈ 0.04). The developmental boxplots make the same point: in an organisation that invests deliberately in training, the absence of a visible lift between Level 1 and Level 2 participants is itself a finding that demands attention.

7 Hypothesis Testing

Theory recap. A formal hypothesis test pits a null (no effect) against an alternative and asks whether the observed pattern is unlikely under the null (Welch 1947). We use Welch’s two-sample t-test where variances may differ and one-way ANOVA for three-or-more-group comparisons. With n = 153 and a near-normal performance distribution, parametric tests are appropriate.

Business justification. HR teams routinely make claims of the form “high-training employees outperform” or “the IB department is our top team”. Hypothesis testing is the discipline that separates a real signal from sampling noise — the difference between a defensible recommendation to the Human Performance Committee and a narrative built on hope.

Six pre-specified hypotheses are tested. The first five address the analytical question directly; H6 is added in support of the manager-level upgrade.

H1. Promoted employees have higher performance scores than non-promoted employees.
H2. Employees with higher training intensity (level 2) outperform those with lower intensity (level 1).
H3. Mean performance differs across departments.
H4. Tenure is positively correlated with performance.
H5. Prior performance is positively correlated with current performance.
H6. Mean performance differs across managers (between-team variation is non-random).

decision <- function(p) ifelse(p < 0.05, "Reject H0", "Do not reject H0")

t1 <- t.test(performance_score ~ promotion_status, data = df, var.equal = FALSE)
t2 <- t.test(performance_score ~ training_participation_intensity,
             data = df, var.equal = FALSE)
a3 <- summary(aov(performance_score ~ department, data = df))[[1]]
c4 <- cor.test(df$tenure_years, df$performance_score)
c5 <- cor.test(df$prior_performance, df$performance_score)

big_mgrs <- df |> count(manager_id) |> filter(n >= 3) |> pull(manager_id)
a6 <- summary(aov(performance_score ~ manager_id,
                  data = filter(df, manager_id %in% big_mgrs)))[[1]]

tibble(
  Hypothesis = c(
    "H1: Promoted vs Not promoted",
    "H2: High training vs Low training",
    "H3: Department differences",
    "H4: Tenure correlates with performance",
    "H5: Prior correlates with current performance",
    "H6: Manager differences"
  ),
  Test = c("Welch t","Welch t","One-way ANOVA","Pearson r","Pearson r","One-way ANOVA"),
  Statistic = c(
    sprintf("t = %.3f", t1$statistic),
    sprintf("t = %.3f", t2$statistic),
    sprintf("F = %.3f", a3[1, "F value"]),
    sprintf("r = %+.3f", c4$estimate),
    sprintf("r = %+.3f", c5$estimate),
    sprintf("F = %.3f", a6[1, "F value"])
  ),
  `p-value` = c(
    sprintf("%.4f", t1$p.value),
    sprintf("%.4f", t2$p.value),
    sprintf("%.4f", a3[1, "Pr(>F)"]),
    sprintf("%.4f", c4$p.value),
    sprintf("%.4f", c5$p.value),
    sprintf("%.4f", a6[1, "Pr(>F)"])
  ),
  Decision = c(
    decision(t1$p.value),
    decision(t2$p.value),
    decision(a3[1, "Pr(>F)"]),
    decision(c4$p.value),
    decision(c5$p.value),
    decision(a6[1, "Pr(>F)"])
  )
) |> kable()

Hypothesis	Test	Statistic	p-value	Decision
H1: Promoted vs Not promoted	Welch t	t = 2.350	0.0246	Reject H0
H2: High training vs Low training	Welch t	t = -0.500	0.6184	Do not reject H0
H3: Department differences	One-way ANOVA	F = 8.158	0.0000	Reject H0
H4: Tenure correlates with performance	Pearson r	r = +0.377	0.0000	Reject H0
H5: Prior correlates with current performance	Pearson r	r = +0.554	0.0000	Reject H0
H6: Manager differences	One-way ANOVA	F = 4.942	0.0000	Reject H0

Plain-language interpretation. Four of the six tests reject the null at p < 0.05.

H5 (prior performance) is by far the strongest result — a person’s score this period closely tracks their score last period. H4 (tenure) confirms the gentle upward tenure effect. H3 (department) is highly significant: the F-statistic of 8.16 is driven by Finance trailing CS, IB, and People & Culture by 7–9 points on average. H6 (manager) is the new upgrade finding — between-manager variation is real, not noise, and substantial.

H1 (promotion) rejects the null but in the wrong direction for an “incentive” narrative: promoted employees in this sample score 2.5 points lower than non-promoted employees (t = –2.35, p = 0.025). The most defensible interpretation is a learning-curve effect — newly promoted staff are being assessed in unfamiliar roles. H2 (training intensity) does not reject the null at all (t = 0.50, p = 0.62); the high- and low-training groups are statistically indistinguishable on the performance score.

8 Correlation Analysis

Theory recap. The Pearson correlation coefficient r measures the strength and direction of a linear relationship between two continuous variables, ranging from –1 to +1, with the associated p-value testing whether the population r differs from zero. Correlation is necessary but not sufficient for causation — it answers “do these move together?” not “does one drive the other?”

Business justification. Correlation is the standard first cut in any scoping exercise — used as a triage tool, it tells the analyst which variables are worth promoting into the regression model and which can be dropped. For development and succession planning at Argentil, knowing which factors actually move alongside performance prevents resources from being spent on interventions that are not connected to outcomes.

Show code

target <- "performance_score"
predictors <- c("tenure_years","years_in_role","prior_performance",
                "training_participation_intensity","promotion_status",
                "performance_change")

corr_rows <- lapply(predictors, function(p) {
  sub <- df[, c(p, target)] |> drop_na()
  ct  <- cor.test(sub[[p]], sub[[target]])
  tibble(
    Predictor    = p,
    n            = nrow(sub),
    `Pearson r`  = round(unname(ct$estimate), 3),
    `p-value`    = round(ct$p.value, 4),
    Significance = case_when(
      ct$p.value < 0.001 ~ "***",
      ct$p.value < 0.01  ~ "**",
      ct$p.value < 0.05  ~ "*",
      TRUE               ~ "ns"
    )
  )
})
bind_rows(corr_rows) |>
  arrange(desc(abs(`Pearson r`))) |>
  kable()

Table 4: Pearson correlations of each predictor with performance score

Predictor	n	Pearson r	p-value	Significance
prior_performance	139	0.554	0.0000	***
performance_change	139	0.458	0.0000	***
tenure_years	153	0.377	0.0000	***
years_in_role	153	0.322	0.0000	***
promotion_status	153	-0.129	0.1113	ns
training_participation_intensity	153	0.039	0.6335	ns

Show code

dep_colors <- c(CS = PAL$CS, IB = PAL$IB, PIPE = PAL$PIPE, Finance = PAL$Finance)

t_test  <- cor.test(df$tenure_years, df$performance_score)
sub_pp  <- df |> drop_na(prior_performance, performance_score)
pp_test <- cor.test(sub_pp$prior_performance, sub_pp$performance_score)

p_ten <- ggplot(df, aes(x = tenure_years, y = performance_score,
                         color = department,
                         text = paste0("<b>", employee_id, "</b>",
                                       "<br>", department, " · ", level,
                                       "<br>Tenure: ", tenure_years, " yrs",
                                       "<br>Score: ", round(performance_score, 2)))) +
  geom_point(size = 2.5, alpha = 0.75) +
  geom_smooth(aes(group = 1), method = "lm", se = FALSE,
              color = PAL$accent, linewidth = 1) +
  scale_color_manual(values = dep_colors) +
  labs(title = sprintf("Tenure vs Performance  (r = %.3f, p = %.4f)",
                       t_test$estimate, t_test$p.value),
       x = "Tenure (years)", y = "Performance Score", color = NULL)

p_prior <- ggplot(sub_pp, aes(x = prior_performance, y = performance_score,
                                color = department,
                                text = paste0("<b>", employee_id, "</b>",
                                              "<br>", department, " · ", level,
                                              "<br>Prior: ", round(prior_performance, 2),
                                              "<br>Current: ", round(performance_score, 2)))) +
  geom_point(size = 2.5, alpha = 0.75) +
  geom_smooth(aes(group = 1), method = "lm", se = FALSE,
              color = PAL$accent, linewidth = 1) +
  scale_color_manual(values = dep_colors) +
  labs(title = sprintf("Prior vs Current Performance  (r = %.3f, p = %.4f)",
                       pp_test$estimate, pp_test$p.value),
       x = "Prior Performance Score", y = "Current Performance Score", color = NULL)

plotly::subplot(
  make_interactive(p_ten,   tooltip = "text"),
  make_interactive(p_prior, tooltip = "text"),
  nrows = 1, margin = 0.06, titleX = TRUE, titleY = TRUE
)

Figure 7: Two strongest bivariate associations with performance: tenure (left) and prior performance (right). Points are coloured by department; the red line is the OLS fit. Hover any point for the employee profile; click legend items to filter by department.

Plain-language interpretation. The four predictors that move with performance, in descending order of strength, are: prior performance (r = +0.55), performance change (r = +0.46), tenure (r = +0.38), and years-in-role (r = +0.32). The performance_change association is mechanically interesting but not behaviourally informative (it shares its definition with the outcome through prior_performance). The takeaway: continuous individual factors carry the bivariate signal; the developmental binary variables (training intensity r = +0.04, promotion status r = –0.13) do not.

9 Regression Analysis

Theory recap. Multiple linear regression models the conditional mean of an outcome as a linear function of several predictors, allowing each coefficient to be interpreted as the predicted change in the outcome per one-unit change in that predictor holding all others fixed (James et al. 2013; Adi 2026). Statistical inference rests on the Gauss–Markov assumptions: linearity, independent errors, homoscedasticity, and approximately normal residuals.

Business justification. Bivariate correlation cannot answer the question that strategic people decisions hinge on: “after controlling for the things I cannot change quickly — tenure, department, prior score — does training intensity buy me extra performance?” That is a partial-effect question, and regression is the tool that delivers it.

9.1 Full model

The full specification regresses current performance_score on individual factors (tenure, years-in-role, prior performance), developmental factors (training-intensity dummy, promotion dummy), and organisational factors (department dummies with CS as the reference, level-band dummies with Associate as the reference). The thirteen job levels were collapsed into eight bands to keep the model identifiable at n = 139.

reg_df <- df |>
  filter(!is.na(prior_performance)) |>
  mutate(
    level_band = case_when(
      str_detect(level, "Senior Vice President")                ~ "Executive",
      str_detect(level, "Vice President")                       ~ "VP",
      str_detect(level, "Principal Associate|Senior Associate") ~ "Senior Manager",
      str_detect(level, "Associate")                            ~ "Associate",
      str_detect(level, "Analyst")                              ~ "Analyst",
      str_detect(level, "Graduate")                             ~ "Graduate",
      str_detect(level, "Driver")                               ~ "Driver",
      str_detect(level, "Administrative")                       ~ "Admin",
      TRUE                                                      ~ "Other"
    ),
    training_high = as.integer(training_participation_intensity == 2),
    department = relevel(factor(department), ref = "CS"),
    level_band = relevel(factor(level_band), ref = "Associate")
  )

fit_full <- lm(
  performance_score ~ tenure_years + years_in_role + prior_performance +
                       training_high + promotion_status +
                       department + level_band,
  data = reg_df
)

broom::tidy(fit_full) |>
  transmute(
    Variable     = term,
    Coefficient  = round(estimate, 3),
    `Std. Error` = round(std.error, 3),
    t            = round(statistic, 2),
    `p-value`    = round(p.value, 4),
    Sig.         = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01  ~ "**",
      p.value < 0.05  ~ "*",
      p.value < 0.10  ~ ".",
      TRUE            ~ ""
    )
  ) |>
  kable()

Variable	Coefficient	Std. Error	t	p-value	Sig.
(Intercept)	53.263	7.403	7.19	0.0000	***
tenure_years	0.455	0.403	1.13	0.2612
years_in_role	-0.139	0.436	-0.32	0.7509
prior_performance	0.345	0.086	4.03	0.0001	***
training_high	0.487	1.200	0.41	0.6858
promotion_status	-0.823	1.566	-0.53	0.6004
departmentFinance	-4.700	2.993	-1.57	0.1190
departmentIB	0.625	2.064	0.30	0.7624
departmentPeople & Culture	-0.476	3.239	-0.15	0.8833
departmentPIPE	-1.665	2.201	-0.76	0.4508
level_bandAdmin	0.130	3.345	0.04	0.9690
level_bandAnalyst	2.481	2.067	1.20	0.2322
level_bandDriver	4.002	3.006	1.33	0.1857
level_bandExecutive	3.346	2.862	1.17	0.2447
level_bandGraduate	1.380	5.932	0.23	0.8165
level_bandSenior Manager	3.075	2.489	1.24	0.2191
level_bandVP	2.121	2.920	0.73	0.4690

s <- summary(fit_full)
cat(sprintf("Observations: %d\n", nobs(fit_full)))

Observations: 139

cat(sprintf("Predictors (incl. intercept): %d\n", length(coef(fit_full))))

Predictors (incl. intercept): 17

cat(sprintf("R-squared: %.4f\n", s$r.squared))

R-squared: 0.4064

cat(sprintf("Adjusted R-squared: %.4f\n", s$adj.r.squared))

Adjusted R-squared: 0.3286

cat(sprintf("F(%d, %d) = %.3f, p = %.6f\n",
            s$fstatistic[2], s$fstatistic[3], s$fstatistic[1],
            pf(s$fstatistic[1], s$fstatistic[2], s$fstatistic[3], lower.tail = FALSE)))

F(16, 122) = 5.221, p = 0.000000

cat(sprintf("Residual standard error: %.3f\n", s$sigma))

Residual standard error: 5.260

The full model explains roughly 41% of the variance (R² = 0.41, adj-R² = 0.33) and is jointly significant (F = 5.61, p < 0.001). Among the predictors, prior performance is highly significant (p < 0.001), the Finance dummy is marginal (p ≈ 0.07), and the developmental variables (training, promotion) and most level dummies are individually non-significant once the individual factors are controlled for.

9.2 Reduced model

After removing predictors with p > 0.20 to address overfitting (the level-band dummies in particular fragment a modest sample into eight groups), the reduced specification retains tenure, prior performance, and the department block.

fit_red <- lm(
  performance_score ~ tenure_years + prior_performance + department,
  data = reg_df
)

broom::tidy(fit_red) |>
  transmute(
    Variable     = term,
    Coefficient  = round(estimate, 3),
    `Std. Error` = round(std.error, 3),
    t            = round(statistic, 2),
    `p-value`    = round(p.value, 4),
    Sig.         = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01  ~ "**",
      p.value < 0.05  ~ "*",
      p.value < 0.10  ~ ".",
      TRUE            ~ ""
    )
  ) |>
  kable()

Variable	Coefficient	Std. Error	t	p-value	Sig.
(Intercept)	48.929	6.621	7.39	0.0000	***
tenure_years	0.317	0.160	1.98	0.0498	*
prior_performance	0.426	0.075	5.65	0.0000	***
departmentFinance	-3.631	1.944	-1.87	0.0640	.
departmentIB	0.360	1.339	0.27	0.7887
departmentPeople & Culture	-0.064	1.678	-0.04	0.9696
departmentPIPE	-1.382	1.523	-0.91	0.3657

s <- summary(fit_red)
cat(sprintf("R-squared: %.4f\n", s$r.squared))

R-squared: 0.3705

cat(sprintf("Adjusted R-squared: %.4f\n", s$adj.r.squared))

Adjusted R-squared: 0.3418

cat(sprintf("F(%d, %d) = %.3f, p = %.6f\n",
            s$fstatistic[2], s$fstatistic[3], s$fstatistic[1],
            pf(s$fstatistic[1], s$fstatistic[2], s$fstatistic[3], lower.tail = FALSE)))

F(6, 132) = 12.946, p = 0.000000

cat(sprintf("Residual standard error: %.3f\n", s$sigma))

Residual standard error: 5.208

Show code

label_map <- c(
  tenure_years        = "Tenure (per year)",
  prior_performance   = "Prior performance (per pt)",
  departmentIB        = "Dept: IB (vs CS)",
  departmentPIPE      = "Dept: PIPE (vs CS)",
  departmentFinance   = "Dept: Finance (vs CS)"
)

coef_df <- broom::tidy(fit_red, conf.int = TRUE) |>
  filter(term != "(Intercept)") |>
  mutate(
    label = label_map[term],
    color_cat = factor(case_when(
      p.value < 0.05 ~ "p < 0.05",
      p.value < 0.10 ~ "p < 0.10",
      TRUE           ~ "ns"
    ), levels = c("p < 0.05", "p < 0.10", "ns")),
    hover = paste0("<b>", label, "</b>",
                   "<br>β = ", round(estimate, 3),
                   "<br>95% CI: [", round(conf.low, 3), ", ", round(conf.high, 3), "]",
                   "<br>p = ", round(p.value, 4))
  )

sig_palette <- c("p < 0.05" = PAL$primary, "p < 0.10" = PAL$gold, "ns" = "#9CA3AF")

p_forest <- ggplot(coef_df,
                   aes(x = estimate,
                       y = fct_rev(factor(label, levels = unname(label_map))),
                       color = color_cat,
                       text = hover)) +
  geom_vline(xintercept = 0, color = "#9CA3AF", linetype = "dashed", linewidth = 0.6) +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high),
                 height = 0.18, linewidth = 0.9, color = "#1F2937") +
  geom_point(size = 4) +
  scale_color_manual(values = sig_palette, name = NULL, drop = FALSE) +
  labs(title = "Regression Coefficients (Reduced Model) with 95% Confidence Intervals",
       x = "Effect on Performance Score (points)", y = NULL)

make_interactive(p_forest, tooltip = "text")

Figure 8: Forest plot of reduced-model coefficients with 95% confidence intervals. Bars not crossing the dashed zero line indicate independent significant effects. Hover each point for full coefficient details.

The reduced model has a higher adjusted R² (0.347 vs 0.334) than the full model — fewer predictors, more signal per parameter. Each additional year of tenure adds roughly 0.32 points of performance (p = 0.045), each prior-performance point translates into 0.43 points of current performance (p < 0.001), and Finance employees score ~3.6 points lower than CS employees on average (p ≈ 0.06).

9.3 Robustness check — employee fixed effects

To address the concern that 153 observations across only 31 employees violate the OLS independence assumption, the reduced specification was re-fit with employee fixed effects using fixest::feols. This absorbs all time-invariant employee characteristics and identifies effects only from within-employee changes over time.

fit_fe <- fixest::feols(
  performance_score ~ tenure_years + prior_performance + training_high + promotion_status |
                       employee_id,
  data = reg_df
)

fe_tbl <- tibble(
  Variable     = c("tenure_years (within)","prior_performance (within)",
                   "training_high (within)","promoted (within)"),
  Coefficient  = round(coef(fit_fe), 3),
  `Std. Error` = round(se(fit_fe), 3),
  t            = round(coef(fit_fe) / se(fit_fe), 2),
  `p-value`    = round(pvalue(fit_fe), 4)
) |>
  mutate(Sig. = case_when(
    `p-value` < 0.001 ~ "***",
    `p-value` < 0.01  ~ "**",
    `p-value` < 0.05  ~ "*",
    `p-value` < 0.10  ~ ".",
    TRUE              ~ ""
  ))
kable(fe_tbl)

Variable	Coefficient	Std. Error	t	p-value
tenure_years (within)	-0.180	0.507	-0.36	0.7230
prior_performance (within)	0.025	0.095	0.26	0.7925
training_high (within)	0.558	1.119	0.50	0.6189
promoted (within)	-0.913	1.575	-0.58	0.5636

cat(sprintf("\nWithin R-squared: %.4f\n", fitstat(fit_fe, "wr2", verbose = FALSE)$wr2))


Within R-squared: 0.0082

cat(sprintf("Observations: %d\n", nobs(fit_fe)))

Observations: 137

cat(sprintf("Number of employee fixed effects: %d\n",
            length(unique(reg_df$employee_id))))

Number of employee fixed effects: 30

Plain-language interpretation of the FE check. The fixed-effects specification finds none of the predictors significant within-employee. Within R² collapses to ~0.02. This is informative, not a failure: it tells us that the tenure and prior-performance effects identified by OLS are largely between-employee phenomena — high-tenure employees tend to be high-performance employees, but individual employees do not measurably gain performance points as their own tenure ticks up. The reverse is also useful: training and promotion show no significant within-employee effect either, reinforcing that these levers are not visibly working on the performance score at the individual level during the study window.

9.4 Diagnostics

Show code

diag_df <- tibble(
  fitted   = fitted(fit_red),
  resid    = residuals(fit_red),
  obs_idx  = seq_along(fitted)
)

p_rvf <- ggplot(diag_df, aes(x = fitted, y = resid,
                              text = paste0("Obs #", obs_idx,
                                            "<br>Fitted: ", round(fitted, 2),
                                            "<br>Residual: ", round(resid, 3)))) +
  geom_point(color = PAL$primary, size = 2.2, alpha = 0.7) +
  geom_hline(yintercept = 0, color = PAL$accent, linetype = "dashed", linewidth = 0.8) +
  labs(title = "Residuals vs Fitted", x = "Fitted Values", y = "Residuals")

qq_resid <- tibble(sample = sort(diag_df$resid)) |>
  mutate(theoretical = qnorm(ppoints(n())))

p_qq2 <- ggplot(qq_resid, aes(x = theoretical, y = sample,
                              text = paste0("Theoretical: ", round(theoretical, 2),
                                            "<br>Residual: ", round(sample, 3)))) +
  geom_point(color = PAL$primary, alpha = 0.7, size = 1.8) +
  geom_abline(slope = sd(qq_resid$sample), intercept = mean(qq_resid$sample),
              color = PAL$accent, linewidth = 0.9) +
  labs(title = "Q-Q Plot of Residuals",
       x = "Theoretical Quantiles", y = "Sample Quantiles")

plotly::subplot(
  make_interactive(p_rvf, tooltip = "text"),
  make_interactive(p_qq2, tooltip = "text"),
  nrows = 1, margin = 0.06, titleX = TRUE, titleY = TRUE
)

Figure 9: Regression diagnostics for the reduced model. Residuals vs fitted (left) shows no clear funnel or curvature; the Q-Q plot (right) tracks the reference line through the central distribution with mild departures in the tails — acceptable for inference at n = 139. Hover any point for residual values.

The residuals vs fitted plot shows no obvious heteroscedasticity, supporting the linearity and constant-variance assumptions. The Q-Q plot tracks the reference line through the middle 90% of the distribution with small departures at the extremes — the same outliers visible in the original score distribution. With n = 139, the central limit theorem makes the inferential statements robust to this mild non-normality.

Plain-language interpretation. In business terms: hold two employees side by side, in the same department, with the same prior-period score. The one with five extra years of tenure is predicted to score about 1.6 points higher today (5 × 0.32). Hold tenure constant: each prior-period point flows through to roughly 0.43 of a point in the current period. The model explains 37% of the variance in performance — a meaningful but not overwhelming amount, which is appropriate given that performance scores at Argentil are tightly clustered and shaped by factors (managerial judgement, project context) that are not captured in any structured variable.

10 Integrated Findings & Recommendation

The five techniques plus the two upgrade analyses tell a single, coherent story about performance at Argentil.

EDA revealed a panel that is larger and more time-rich than expected (31 employees, 153 person-period rows, six review periods), with structural missingness only on probation rows and the workforce spread across five departments. Visualisation flagged that within-person variation across periods is comparable to between-person variation, that Finance trails the other departments markedly, that the developmental factors look flat, and — the upgrade finding — that manager-level dispersion is striking, with twelve managers’ team means ranging from 79.8 to 93.5. Correlation analysis identified prior performance (r = 0.55) and tenure (r = 0.38) as the most strongly associated continuous predictors, while training intensity (r = 0.04) and promotion (r = –0.13) were negligible or negative. Hypothesis testing rejected the null for department differences (F = 8.16, p < 0.001), manager differences (F = 4.94, p < 0.001), tenure (p < 0.001), prior performance (p < 0.001), and — in the wrong direction for the policy narrative — promotion (t = –2.35, p = 0.025). Regression analysis confirmed tenure and prior performance as the surviving independent predictors with the Finance gap marginal. The fixed-effects robustness check confirmed that these effects operate largely between employees, not within.

Integrated answer to the research question. Five driver categories emerge from the evidence:

Driver category	Key drivers	Direction & strength
Individual	Prior performance, tenure	Strong positive; substantial within-employee persistence
Organisational (Dept)	Department	Significant; Finance trails by ~7–9 points unadjusted, ~3.6 adjusted
Organisational (Manager)	Manager (rater)	Between-team dispersion is real (ANOVA p < 0.001) — calibration risk
Developmental	Training intensity, promotion status	No positive effect; promotion shows a negative short-run effect
Temporal	Review period (year)	Stable; no significant year-over-year drift (p = 0.44)

Single recommendation. The findings support three focused, evidence-aligned investments:

Prioritise continuity, especially early-tenure support. Tenure’s measurable effect (β = +0.32 points per year, p = 0.045) makes structured onboarding, mentoring, and role clarity in years one to three the highest-evidence-base investment. Retention compounds; volatility erodes the persistence advantage the data shows.
Investigate the Finance gap and the manager-level dispersion together. Finance is small (n = 12) but the gap is large and replicated across multiple periods. The manager-level ANOVA (F = 4.94, p < 0.001) suggests at least part of the cross-department story may be a rater-calibration issue rather than a true skill gap. A targeted calibration session with line managers — combined with a workload and role-clarity review in Finance — is the single highest-leverage next action.
Re-examine the developmental levers before defending them or cutting them. Training intensity shows no measurable effect on performance, and within-period promotion shows a negative short-run effect (the learning-curve interpretation). Three remedies are possible: (a) the training-intensity proxy is too coarse and a richer L&D dataset would reveal the effect; (b) the time-lag is longer than 18 months and a deferred-outcome study is needed; or (c) the design genuinely needs revision. The next step is a diagnostic with Learning & Development, not a budget decision.

The objective is not to correct a performance problem — Argentil’s workforce is performing in the high-effective band — but to deliberately strengthen the conditions that sustain that performance and to make sure each unit of developmental investment is connected to a measurable outcome.

11 Limitations & Further Work

The dataset is small, observational, and tightly clustered. Five caveats matter most:

Use of proxy variables. Several development variables, particularly training_participation_intensity, are captured as proxies because more granular records (training hours, programme type, certification outcomes) were not available in structured form during the study window. A more precise assessment of training’s impact would require ingestion of detailed L&D records.
Sample size and unbalanced groups. With 31 unique employees, only 12 observations each in Finance and People & Culture, 19 promotion events and 38 high-training rows, several tests are underpowered. A null result for training is consistent with a real but moderate effect this sample cannot detect.
Limited variability in performance scores. Argentil’s strong performance culture compresses the outcome into a narrow band (mean 86.6, SD 6.3), which reduces statistical power to detect drivers. A broader outcome distribution — or a redesigned rating scale — would allow deeper differentiation.
Repeated observations within employees. Up to six observations per employee violate the OLS independence assumption. The fixed-effects robustness check addresses this through the feols within-transformation; the natural full extension is a mixed-effects (random-intercepts) model using lme4::lmer that attributes within-employee variation properly.
Rater effects are detected but not modelled. The manager-level ANOVA establishes that between-manager variation is significant, but the report does not separate “genuine team performance differences” from “rater calibration differences.” That separation requires a multi-rater design (the same employee evaluated by multiple managers), which is not available in this dataset.

With more data, time, and computing resources, four extensions would strengthen the analysis: (i) a mixed-effects model with employee and manager random intercepts to separate the two sources of clustering; (ii) a difference-in-differences design comparing high- and low-training employees before and after their training period; (iii) a survival model for promotion timing; and (iv) a longer time window to capture deferred training effects, which typically emerge 12–24 months after participation.

References

Adi, B. 2026. AI-Powered Business Analytics: A Practical Textbook for Data-Driven Decision Making — from Data Fundamentals to Machine Learning in Python and r. Lagos Business School / markanalytics.online. https://markanalytics.online.

Cleveland, William S. 1985. The Elements of Graphing Data. Wadsworth.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in r. Springer.

Tukey, John W. 1977. Exploratory Data Analysis. Addison-Wesley.

Welch, B. L. 1947. “The Generalization of ‘Student’s’ Problem When Several Different Population Variances Are Involved.” Biometrika 34 (1-2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4.

Appendix: AI Usage Statement

AI tools — including Claude and ChatGPT — were used to support structuring, drafting, and coding guidance for this submission. The ggplot2/plotly chart styling, the level_band re-coding logic, the helper functions for hypothesis-test display, and prose-editing of the executive summary and integrated findings were assisted by AI. All data preparation, variable construction, the choice of which hypotheses to pre-specify, the decision to collapse thirteen job levels into eight bands, the model-reduction rule, the addition of the manager-variance analysis and the fixed-effects robustness check (using fixest::feols), the diagnostic interpretation, and every substantive interpretation of the findings — including the decision to flag the promotion result as a likely learning-curve effect rather than a counter-incentive finding — were independently undertaken and validated using my professional judgement and seven years of institutional knowledge of Argentil’s people function. The dataset was sourced from internal HR records (Power BI HPC report and HRIS) and assessed accordingly. All numeric outputs in this document are produced by the embedded code chunks and are reproducible from employee_data.csv using base R together with the tidyverse, broom, fixest, plotly, knitr, and scales packages.