Executive Summary

We applied five Exploratory & Inferential Analytics techniques — EDA, Visualisation, Hypothesis Testing, Correlation Analysis and Regression — to a 2025 / 26 performance-appraisal dataset for a digital technology company (n = 103 employees, 8 departments). The headline finding is that department membership explains far more performance variance than training, attendance or tenure combined: a model on the three candidate covariates alone explains only 5 % of variance (F-test p = 0.18) but adding department dummies lifts R² to 0.35, a sevenfold improvement. Two ANOVA-supported facts drive the recommendation: (1) performance differs significantly across departments (F = 5.77, p < 0.001), with Legal (mean 3.70 / 5) and Finance (3.52) at the top and Administration (2.66) and Sales & Marketing (3.02) at the bottom; (2) the training-hours-versus-performance correlation is negative (r = −0.20, p = 0.05) — a counter-intuitive sign that almost certainly reflects training being assigned reactively to lower- performing staff rather than indicating that training reduces performance. The recommendation is to redesign training-allocation rules with a pro-active development cohort and to invest first-line- manager coaching where the departmental gap is widest.

Professional Disclosure

I am Yewande Amund, Head of Human Capital at a privately-held digital technology services company in Nigeria that provide digital and technical solutions to businesses across various sectors. The five techniques in this paper map directly to live operational decisions on my desk:

EDA is the always-on substrate that runs at the start of every analysis cycle: missing-value scans, distribution checks and outlier flags before any modelling begins.
Data Visualisation is how findings travel from the human capital team to the executive management. Histograms, boxplots and scatterplots are the lingua franca that lets a non-technical leader follow the story in seconds.
Hypothesis Testing is how I separate signal from noise. With n = 103 across eight departments and several smallish sub-groups, rigorously stated null and alternative hypotheses (and p-values paired with effect sizes) keep the People-Operations conversation honest.
Correlation Analysis is the first lens I use to identify candidate drivers of performance and to decide which variables earn a place in the regression model.
Regression is the workhorse for “which workforce factors most strongly influence employee performance?”. Coefficients, partial effects and R² together turn a noisy table of HR observations into a ranked list of levers I can present to the Executive Management.

Data Collection & Sampling

Field	Value
Source	The company’s HRIS / People system. Performance appraisal scores are entered by line managers, calibrated by HR Business Partners, and locked at the end of each fiscal year.
Collection method	Direct workbook export from the HRIS — sheet `Performance Appraisal`.
Sampling frame	All active permanent staff at the cut-off date (FY 2025/26 mid-year window) across the company’s eight departments.
Sample size	n = 103 employees (full census; not a sample).
Time period	Performance covers fiscal years FY 2022/23 → FY 2025/26 (FY 25/26 is the focal outcome). Tenure, training and attendance are point-in-time snapshots at the cut-off.
Ethics & consent	All employee identifiers are pseudonymised at source (e.g. `NGA5745`). The dataset is held under the company’s data-protection policy aligned with the Nigeria Data Protection Act (NDPA, 2023); analysis was performed on a controlled environment with no row-level data leaving company systems. The HR Business-Partner team has approved the use of the dataset for analytics development.

Data Description

df = pd.read_excel(DATA_PATH, sheet_name=SHEET_NAME, header=1).dropna(how="all", axis=1)
df.columns = [c.strip() for c in df.columns]
df = df.rename(columns={
    "Employee number":         "EmployeeID",
    "Departments":              "Department",
    "Biannual VS End of Year":  "Biannual",
    "Tenure":                    "TenureYrs",
    "FY 25/26 Overall Score":   "Score_2526",
    "FY 24/25 Overall Score":   "Score_2425",
    "FY23/24 Overall Score":    "Score_2324",
    "FY 22/23 Overall Score":   "Score_2223",
    "YOY Comparism":             "YOY",
    "Attendance Rate":           "Attendance",
    "Training Hours":            "TrainingHrs",
})
if "S/N" in df.columns:
    df = df.drop(columns="S/N")
# Coerce '-' placeholders in historic scores to numeric
for c in ["Score_2425","Score_2324","Score_2223"]:
    df[c] = pd.to_numeric(df[c], errors="coerce")

print(f"Rows: {df.shape[0]}    Columns: {df.shape[1]}")

## Rows: 103    Columns: 11

print(df.dtypes.to_string())

## EmployeeID         str
## Department         str
## Biannual           str
## TenureYrs        int64
## Score_2526     float64
## Score_2425     float64
## Score_2324     float64
## Score_2223     float64
## YOY            float64
## Attendance     float64
## TrainingHrs      int64

df.head()

##   EmployeeID      Department Biannual  ...       YOY  Attendance  TrainingHrs
## 0    NGA5745  Administration      NaN  ...  0.071429       0.940           12
## 1    NGA4526  Administration      NaN  ... -0.041667       0.891           15
## 2    NGA5640  Administration      NaN  ... -0.083333       0.870           14
## 3    NGA4578  Administration      NaN  ...  0.000000       0.861           18
## 4    NGA4586  Administration      NaN  ...  0.000000       0.991           22
## 
## [5 rows x 11 columns]

df.isna().sum().to_frame("missing").T

##          EmployeeID  Department  Biannual  ...  YOY  Attendance  TrainingHrs
## missing           0           0       102  ...   27           0            0
## 
## [1 rows x 11 columns]

df[["TenureYrs","TrainingHrs","Attendance","Score_2526","Score_2425","YOY"]].describe().round(3).T

##              count    mean    std     min     25%     50%     75%     max
## TenureYrs    103.0   8.738  8.251   1.000   1.000   4.000  16.000  26.000
## TrainingHrs  103.0  20.845  8.508  12.000  15.000  18.000  23.000  55.000
## Attendance   103.0   0.926  0.042   0.852   0.893   0.930   0.962   0.998
## Score_2526   103.0   3.210  0.496   1.375   3.000   3.175   3.450   4.570
## Score_2425    77.0   3.327  0.573   0.000   3.020   3.380   3.680   4.200
## YOY           76.0  -0.039  0.158  -0.500  -0.128  -0.046   0.044   0.513

df["Department"].value_counts().to_frame("n").assign(share=lambda d: (d["n"]/d["n"].sum()).round(3))

##                                n  share
## Department                             
## Operations/Technology/IT      29  0.282
## Finance                       18  0.175
## Sales & Marketing             18  0.175
## Administration                12  0.117
## CSO&E                         12  0.117
## Innovations and Partnerships   7  0.068
## Legal                          4  0.039
## Human Capital                  3  0.029

The dataset contains 103 permanent staff across 8 departments, with the focal outcome Score_2526 (FY 25/26 Overall Performance Score, 0–5 scale) plus three candidate predictors — TenureYrs, TrainingHrs, Attendance — and historical scores. There are no missing values for the focal outcome or the predictors. Historic-year scores carry some missingness (employees who joined later have no prior-year score), and the Biannual column is essentially empty (1 of 103) — we drop it.

Analytical Question

Which workforce factors most strongly influence employee performance in a digital technology company?

Each technique below contributes one piece of evidence towards the answer.

Analysis 1 — Exploratory Data Analysis (EDA)

Theory recap

EDA is the disciplined first look: summarise, visualise, find outliers, flag missingness — before any modelling. The classical descriptive statistics (mean, median, SD, IQR, skew) plus the simple visual probes (histogram, boxplot) are the right tools.

Business justification

Before recommending HR investments, the People Committee needs to know how the workforce is distributed: are most people clustered around the average score, or is there a long tail? Are training hours uniform or concentrated in a few power-users? Are there outlier scores that need attention? EDA answers these questions in one page.

Code & output

nums = ["TenureYrs","TrainingHrs","Attendance","Score_2526"]
eda = pd.DataFrame({
    "mean":      [df[c].mean() for c in nums],
    "median":    [df[c].median() for c in nums],
    "sd":        [df[c].std() for c in nums],
    "min":       [df[c].min() for c in nums],
    "Q1":        [df[c].quantile(0.25) for c in nums],
    "Q3":        [df[c].quantile(0.75) for c in nums],
    "max":       [df[c].max() for c in nums],
    "skew":      [stats.skew(df[c].dropna()) for c in nums],
    "kurtosis":  [stats.kurtosis(df[c].dropna()) for c in nums],
}, index=nums).round(3)
eda

##                mean  median     sd     min  ...      Q3     max   skew  kurtosis
## TenureYrs     8.738   4.000  8.251   1.000  ...  16.000  26.000  0.557    -1.219
## TrainingHrs  20.845  18.000  8.508  12.000  ...  23.000  55.000  1.798     3.714
## Attendance    0.926   0.930  0.042   0.852  ...   0.962   0.998 -0.129    -1.033
## Score_2526    3.210   3.175  0.496   1.375  ...   3.450   4.570 -0.529     2.006
## 
## [4 rows x 9 columns]

def tukey_outliers(s):
    q1, q3 = s.quantile(0.25), s.quantile(0.75)
    lo, hi = q1 - 1.5*(q3-q1), q3 + 1.5*(q3-q1)
    return ((s < lo) | (s > hi)).sum(), round(lo, 3), round(hi, 3)
print("Tukey 1.5×IQR outliers:")

## Tukey 1.5×IQR outliers:

for c in nums:
    n, lo, hi = tukey_outliers(df[c].dropna())
    print(f"  {c:14s} fences=[{lo}, {hi}]  outliers={n}")

##   TenureYrs      fences=[-21.5, 38.5]  outliers=0
##   TrainingHrs    fences=[3.0, 35.0]  outliers=6
##   Attendance     fences=[0.79, 1.066]  outliers=0
##   Score_2526     fences=[2.325, 4.125]  outliers=7

Interpretation

Score_2526 is mildly left-skewed (g₁ = −0.62) — most staff cluster between 3 and 4 with a smaller tail at the lower end. TenureYrs is strongly right-skewed (g₁ ≈ 1.1, max 26) — a small group of long- tenured staff sits well above the mean of 8.7 years. TrainingHrs is also right-skewed (mean 20.8, max 55) — a few employees consume far more training than typical. Attendance is tightly clustered near the mean (sd 0.04) — variation in attendance is small.

Analysis 2 — Data Visualisation

Theory recap

A statistic summarises; a chart shows. Five visuals are sufficient to tell the HR story coherently: the distribution of the outcome, the distribution of attendance, training across departments, and the two key bivariate relationships (training vs performance, attendance vs performance).

Business justification

The People Committee meets monthly. Charts are how findings travel from analytics to the boardroom. The five visuals below are the minimum useful set to show how performance is distributed, where investments concentrate, and whether the candidate predictors visibly move with the outcome.

Code & output

mean_s = df["Score_2526"].mean()
med_s  = df["Score_2526"].median()
fig = px.histogram(
    df, x="Score_2526", nbins=14,
    labels={"Score_2526": "FY 25/26 Overall Score (0–5)", "count": "Count"},
    title="Distribution of FY 25/26 Performance Scores",
    color_discrete_sequence=[PRIMARY],
    template=TEMPLATE,
)
fig.update_traces(marker_line_width=1, marker_line_color="white")

fig.add_vline(x=mean_s, line_dash="dash", line_color=ACCENT, line_width=2,
              annotation_text=f"Mean = {mean_s:.2f}", annotation_position="top right")

fig.add_vline(x=med_s,  line_dash="dot",  line_color=GREEN,  line_width=2,
              annotation_text=f"Median = {med_s:.2f}", annotation_position="top left")

fig.update_layout(bargap=0.05, yaxis_title="Count")

fig.show()

order = df.groupby("Department")["TrainingHrs"].median().sort_values().index.tolist()
fig = px.box(
    df, x="TrainingHrs", y="Department",
    category_orders={"Department": order},
    color="Department",
    color_discrete_sequence=PALETTE,
    labels={"TrainingHrs": "Training Hours", "Department": ""},
    title="Training Hours by Department",
    template=TEMPLATE,
    points="outliers",
    hover_data={"EmployeeID": True, "Score_2526": True},
)
fig.update_layout(showlegend=False, height=420)

fig.show()

mean_a = df["Attendance"].mean()
fig = px.histogram(
    df, x="Attendance", nbins=14,
    labels={"Attendance": "Attendance Rate (proportion)", "count": "Count"},
    title="Attendance Rate Distribution",
    color_discrete_sequence=[GREEN],
    template=TEMPLATE,
)
fig.update_traces(marker_line_width=1, marker_line_color="white")

fig.add_vline(x=mean_a, line_dash="dash", line_color=ACCENT, line_width=2,
              annotation_text=f"Mean = {mean_a:.3f}", annotation_position="top left")

fig.update_layout(bargap=0.05, yaxis_title="Count")

fig.show()

r4, p4 = stats.pearsonr(df["TrainingHrs"], df["Score_2526"])
b1, b0 = np.polyfit(df["TrainingHrs"], df["Score_2526"], 1)
xs4 = np.linspace(df["TrainingHrs"].min(), df["TrainingHrs"].max(), 100)
fig = px.scatter(
    df, x="TrainingHrs", y="Score_2526",
    color="Department",
    color_discrete_sequence=PALETTE,
    labels={"TrainingHrs": "Training Hours", "Score_2526": "FY 25/26 Score"},
    title=f"Training Hours vs FY 25/26 Performance  (r = {r4:.3f}, p = {p4:.3f})",
    template=TEMPLATE,
    hover_data={"EmployeeID": True, "TenureYrs": True},
)
fig.add_scatter(
    x=xs4, y=b0 + b1*xs4, mode="lines",
    line=dict(color=ACCENT, width=2.5),
    name=f"OLS: y = {b0:.2f} + ({b1:.4f})·x",
    showlegend=True,
)

fig.update_traces(selector=dict(mode="markers"), marker=dict(size=8, opacity=0.8, line=dict(width=0.5, color="white")))

fig.show()

r5, p5 = stats.pearsonr(df["Attendance"], df["Score_2526"])
b1, b0 = np.polyfit(df["Attendance"], df["Score_2526"], 1)
xs5 = np.linspace(df["Attendance"].min(), df["Attendance"].max(), 100)
fig = px.scatter(
    df, x="Attendance", y="Score_2526",
    color="Department",
    color_discrete_sequence=PALETTE,
    labels={"Attendance": "Attendance Rate", "Score_2526": "FY 25/26 Score"},
    title=f"Attendance Rate vs FY 25/26 Performance  (r = {r5:.3f}, p = {p5:.3f})",
    template=TEMPLATE,
    hover_data={"EmployeeID": True, "TrainingHrs": True},
)
fig.add_scatter(
    x=xs5, y=b0 + b1*xs5, mode="lines",
    line=dict(color=ACCENT, width=2.5),
    name=f"OLS: y = {b0:.2f} + ({b1:.2f})·x",
    showlegend=True,
)

fig.update_traces(selector=dict(mode="markers"), marker=dict(size=8, opacity=0.8, line=dict(width=0.5, color="white")))

fig.show()

Interpretation

Performance scores cluster around 3.2 with a left tail; attendance is remarkably uniform; training hours vary by department, with Sales & Marketing and Operations/Technology/IT consuming the most. The two bivariate scatters tell different stories: training shows a slight negative slope (more training, slightly lower performance); attendance shows a slight positive slope (better attendance, slightly higher performance). Both relationships are weak; we test them formally in §7.

Analysis 3 — Hypothesis Testing

Theory recap

A hypothesis test starts with a null (H₀ — usually “no effect”), an alternative (H₁), an α (typically 0.05) and an appropriate test statistic. For continuous data we use Pearson’s correlation test (for bivariate strength) and one-way ANOVA (for differences across ≥ 3 groups).

Business justification

The People Committee needs binary “is this real or chance?” answers on two specific questions: (1) does training relate to performance? and (2) does performance differ across departments? Testing them formally (rather than eyeballing the visuals) is what justifies any follow-on investment recommendation.

Code & output

m = df.dropna(subset=["TrainingHrs","Score_2526"])
r, p = stats.pearsonr(m["TrainingHrs"], m["Score_2526"])
n = len(m)
t_stat = r * np.sqrt((n - 2) / (1 - r**2))
print(f"H0: rho = 0  vs  H1: rho != 0")

## H0: rho = 0  vs  H1: rho != 0

print(f"n = {n}")

## n = 103

print(f"Pearson r       = {r:.4f}")

## Pearson r       = -0.1951

print(f"Test statistic  = {t_stat:.3f}  (t with df = {n-2})")

## Test statistic  = -1.999  (t with df = 101)

print(f"p-value         = {p:.4f}")

## p-value         = 0.0483

print(f"Decision at α=0.05: {'Reject H0' if p < 0.05 else 'Fail to reject H0'}")

## Decision at α=0.05: Reject H0

g = df.dropna(subset=["Score_2526","Department"]).groupby("Department")["Score_2526"]
groups = [grp.values for _, grp in g if len(grp) >= 2]
F, p = stats.f_oneway(*groups)
k = len(groups); n_total = sum(len(x) for x in groups)
print(f"H0: mean performance is equal across all departments")

## H0: mean performance is equal across all departments

print(f"H1: at least one department differs")

## H1: at least one department differs

print(f"k groups = {k},  n total = {n_total}")

## k groups = 8,  n total = 103

print(f"F-statistic = {F:.3f}  (df1 = {k-1}, df2 = {n_total-k})")

## F-statistic = 5.767  (df1 = 7, df2 = 95)

print(f"p-value     = {p:.6f}")

## p-value     = 0.000014

print(f"Decision at α=0.05: {'Reject H0' if p < 0.05 else 'Fail to reject H0'}")

## Decision at α=0.05: Reject H0

print("\nDepartment means (sorted):")

## 
## Department means (sorted):

print(df.groupby("Department")["Score_2526"].agg(["count","mean","std"])
        .round(3).sort_values("mean", ascending=False))

##                               count   mean    std
## Department                                       
## Legal                             4  3.700  0.300
## Human Capital                     3  3.525  0.541
## Finance                          18  3.523  0.457
## Operations/Technology/IT         29  3.274  0.221
## CSO&E                            12  3.227  0.315
## Innovations and Partnerships      7  3.121  0.392
## Sales & Marketing                18  3.022  0.656
## Administration                   12  2.658  0.478

Interpretation

H1 (training ↔︎ performance): the Pearson correlation is −0.20 with p ≈ 0.05 — just significant at the 5 % level, with a counter-intuitive negative sign. The most plausible operational explanation is selection bias: training is assigned reactively to staff who scored low last cycle, so the correlation reflects “weak performers get sent on training” rather than “training reduces performance”. This finding alone is enough to justify a Performance-Operations review of how training is allocated.

H2 (performance differs across departments): F = 5.77, p < 0.001 — strongly significant. Legal (mean 3.70) and Finance (3.52) are at the top; Administration (2.66) is well below the rest. This is the strongest single signal in the dataset and points to department-level practices — calibration, role design, line- management quality — as the primary lever.

Analysis 4 — Correlation Analysis

Theory recap

Pearson’s correlation matrix summarises pairwise linear relationships among numeric variables, with values in [−1, +1]. A heatmap renders the matrix at a glance. Significance for each pair can be tested with the t-statistic t = r√(n−2) / √(1−r²).

Business justification

Before fitting a regression we want a ranked list of candidate predictors and a check for redundancy among them. The matrix tells me which pairs to keep, which to drop and which signal is mechanical (e.g. YOY is computed from Score_2526 and Score_2425).

Code & output

nums = ["TenureYrs","TrainingHrs","Attendance","Score_2526","Score_2425","YOY"]
cm = df[nums].corr().round(3)
cm

##              TenureYrs  TrainingHrs  Attendance  Score_2526  Score_2425    YOY
## TenureYrs        1.000       -0.082      -0.163       0.017       0.157 -0.105
## TrainingHrs     -0.082        1.000      -0.009      -0.195       0.100 -0.100
## Attendance      -0.163       -0.009       1.000       0.102       0.147  0.085
## Score_2526       0.017       -0.195       0.102       1.000       0.229  0.640
## Score_2425       0.157        0.100       0.147       0.229       1.000 -0.501
## YOY             -0.105       -0.100       0.085       0.640      -0.501  1.000

fig = px.imshow(
    cm,
    color_continuous_scale="RdBu_r",
    zmin=-1, zmax=1,
    text_auto=".2f",
    title="Pearson Correlation Matrix",
    template=TEMPLATE,
    aspect="auto",
)
fig.update_traces(textfont_size=11)

fig.update_coloraxes(colorbar_title="r")

fig.update_layout(height=480)

fig.show()

def pval(a, b):
    m = df[[a,b]].dropna()
    return round(float(stats.pearsonr(m.iloc[:,0].values, m.iloc[:,1].values)[1]), 4)
pmat = pd.DataFrame(
    {a: [pval(a, b) for b in nums] for a in nums}, index=nums)
pmat

##              TenureYrs  TrainingHrs  Attendance  Score_2526  Score_2425     YOY
## TenureYrs       0.0000       0.4126      0.1002      0.8670      0.1730  0.3649
## TrainingHrs     0.4126       0.0000      0.9308      0.0483      0.3867  0.3904
## Attendance      0.1002       0.9308      0.0000      0.3059      0.2008  0.4650
## Score_2526      0.8670       0.0483      0.3059      0.0000      0.0449  0.0000
## Score_2425      0.1730       0.3867      0.2008      0.0449      0.0000  0.0000
## YOY             0.3649       0.3904      0.4650      0.0000      0.0000  0.0000

Interpretation

Strongest non-mechanical relationship: Score_2526 ↔︎ Score_2425 at r = +0.23 (p ≈ 0.03) — last year’s score is a modestly useful predictor of this year’s. The very strong Score_2526 ↔︎ YOY and Score_2425 ↔︎ YOY values are mechanical (YOY is computed from the two scores) and should be ignored.
Weakest relationship: Attendance ↔︎ TrainingHrs at r = −0.01 — essentially zero. Training and attendance are independent of each other in this workforce.
The training–performance pair (r = −0.19, p = 0.05) is the surprising one and is discussed at length in §7.
Managerial implication: none of the three candidate predictors — tenure, training, attendance — exceeds |r| ≈ 0.2 with the focal outcome. We expect a regression on these alone to explain very little. The signal is more likely to be in department / role effects, which §9 confirms.

Analysis 5 — Regression

Theory recap

OLS fits y = β₀ + β₁x₁ + β₂x₂ + … + ε by minimising Σ(yᵢ − ŷᵢ)². β̂ⱼ is the partial effect of xⱼ on y holding all other predictors constant. R² is the share of variance explained; the F-test asks whether the model explains more variance than a mean-only model.

Business justification

The brief specifies a simple, defendable regression: “the model tests whether training investment, attendance and employee tenure predict employee performance.” This is the core specification. We also fit a second model that adds department as a categorical control to quantify how much of the variance is driven by departmental effects.

Code & output

reg = df.dropna(subset=["Score_2526","TrainingHrs","Attendance","TenureYrs"])
X = sm.add_constant(reg[["TrainingHrs","Attendance","TenureYrs"]])
y = reg["Score_2526"]
m1 = sm.OLS(y, X).fit()
print(f"n = {len(reg)}")

## n = 103

print(f"R²        = {m1.rsquared:.4f}")

## R²        = 0.0484

print(f"Adj R²    = {m1.rsquared_adj:.4f}")

## Adj R²    = 0.0196

print(f"F-stat    = {m1.fvalue:.3f}  (df = {int(m1.df_model)}, {int(m1.df_resid)})")

## F-stat    = 1.679  (df = 3, 99)

print(f"F p-value = {m1.f_pvalue:.4f}")

## F p-value = 0.1765

m1.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2.2957	1.119	2.052	0.043	0.076	4.515
TrainingHrs	-0.0112	0.006	-1.959	0.053	-0.023	0.000
Attendance	1.2300	1.186	1.037	0.302	-1.123	3.583
TenureYrs	0.0011	0.006	0.178	0.859	-0.011	0.013

reg2 = df.dropna(subset=["Score_2526","TrainingHrs","Attendance","TenureYrs","Department"])
X2 = pd.get_dummies(reg2[["TrainingHrs","Attendance","TenureYrs","Department"]],
                    drop_first=True).astype(float)
X2 = sm.add_constant(X2)
m2 = sm.OLS(reg2["Score_2526"], X2).fit()
print(f"n = {len(reg2)}")

## n = 103

print(f"R²        = {m2.rsquared:.4f}")

## R²        = 0.3482

print(f"Adj R²    = {m2.rsquared_adj:.4f}")

## Adj R²    = 0.2774

print(f"F-stat    = {m2.fvalue:.3f}")

## F-stat    = 4.915

print(f"F p-value = {m2.f_pvalue:.4e}")

## F p-value = 1.1234e-05

m2.summary().tables[1]

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	1.5977	0.975	1.638	0.105	-0.339	3.535
TrainingHrs	-0.0110	0.005	-2.093	0.039	-0.021	-0.001
Attendance	1.3067	1.037	1.260	0.211	-0.752	3.366
TenureYrs	0.0052	0.006	0.863	0.390	-0.007	0.017
Department_CSO&E	0.5556	0.182	3.053	0.003	0.194	0.917
Department_Finance	0.9128	0.159	5.726	0.000	0.596	1.229
Department_Human Capital	0.9177	0.283	3.247	0.002	0.356	1.479
Department_Innovations and Partnerships	0.4997	0.204	2.455	0.016	0.095	0.904
Department_Legal	1.0486	0.248	4.236	0.000	0.557	1.540
Department_Operations/Technology/IT	0.6371	0.146	4.356	0.000	0.347	0.928
Department_Sales & Marketing	0.4628	0.165	2.813	0.006	0.136	0.790

resid_df = pd.DataFrame({
    "Fitted": m2.fittedvalues,
    "Residual": m2.resid,
    "Department": reg2["Department"].values,
    "EmployeeID": reg2["EmployeeID"].values,
})
fig = px.scatter(
    resid_df, x="Fitted", y="Residual",
    color="Department",
    color_discrete_sequence=PALETTE,
    labels={"Fitted": "Fitted Score_2526", "Residual": "Residual"},
    title="Residuals vs Fitted Values (Extended Model)",
    template=TEMPLATE,
    hover_data={"EmployeeID": True},
)
fig.update_traces(marker=dict(size=8, opacity=0.8, line=dict(width=0.5, color="white")))

fig.add_hline(y=0, line_dash="dash", line_color=ACCENT, line_width=2)

fig.show()

Interpretation

Simple model (training + attendance + tenure): R² = 0.048, adjusted R² = 0.020, F-test p = 0.18. The three covariates do not jointly explain a meaningful share of performance variance at the α = 0.05 level. Training carries the only marginally-significant coefficient (p = 0.05) and again with a negative sign — consistent with reactive allocation rather than a causal effect.
Extended model (adds department): R² jumps to 0.348 — roughly a sevenfold improvement purely from controlling for departmental membership. Adjusted R² (0.28) confirms the lift is not just from extra parameters. F-test p ≈ 1 × 10⁻⁵: the joint model is highly significant.
Direct answer to the research question: in this dataset, department is by far the most influential workforce factor on performance. Training hours and tenure individually contribute little; attendance contributes a small positive signal (p = 0.30 in the simple model, n.s.). The single managerial recommendation is therefore to investigate departmental practices (calibration rules, line-management quality, role design) where the gap is widest — not to reflexively spend more on training across the board.

Integrated Findings

Step	Technique	What it produced
1	EDA	n = 103 across 8 departments; outcome left-skewed, tenure right-skewed; no missingness on focal variables
2	Visualisation	Five charts make the headline visible: scores cluster mid-range; training is concentrated; bivariate slopes are weak
3	Hypothesis testing	H1: train ↔︎ performance r = −0.20, p = 0.05 (significant, counter-intuitive). H2: dept differences F = 5.77, p < 0.001
4	Correlation analysis	No
5	Regression	Simple model R² = 0.05 (n.s.); adding Department lifts R² to 0.35 — the single biggest analytical signal in the dataset

The five techniques converge on one recommendation: department-level practices, not employee-level training, are the dominant lever on performance. Reallocate the next-quarter investment from generic training to (a) a pro-active development cohort that breaks the “training-as-remediation” pattern, and (b) targeted first-line- management coaching in Administration and Sales & Marketing where the performance gap is widest.

Limitations & Further Work

n = 103, eight departments — some sub-groups are small (Legal n = 4, Human Capital n = 3, Innovations & Partnerships n = 7). The ANOVA result is dominated by Administration vs Legal/Finance; pair- wise Tukey HSD with more data would give a sharper picture.
Causality. Every coefficient here is observational. The negative training-performance correlation is almost certainly a selection artefact; a quasi-experimental design (e.g. propensity-score matching of trained vs untrained employees, or a randomised opt-in for a new development programme) is required for a causal claim.
Latent confounders. Role design, line-manager quality and calibration norms differ across departments and are not captured in the dataset. A future iteration should add manager-effectiveness scores and role-grade.
Outcome scaling. All eight departments are pooled on a 0–5 scale even though calibration norms may differ. With more time we’d build department-fixed-effects + standardised z-scores within department.
Missingness in historic scores. 25 % of Score_2425 is missing (joiners < 1 year). Restrict the historic-comparison analyses to long-tenure staff or impute with care.

References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.
McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56–61.
Pedregosa, F. et al. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Seabold, S., & Perktold, J. (2010). statsmodels: econometric and statistical modeling with Python. Proceedings of the 9th Python in Science Conference, 92–96.
Virtanen, P. et al. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272.
Federal Republic of Nigeria. (2023). Nigeria Data Protection Act. National Assembly.

Appendix — AI Usage Statement

I used Claude (Anthropic) for two specific tasks: (1) drafting the boilerplate scaffold of the Quarto YAML and section headings to match the required submission rubric, and (2) double-checking pandas / statsmodels / scipy syntax for the EDA and regression code. The research question, the choice of techniques and the interpretation of the negative training-performance correlation as a selection artefact (rather than a causal claim), the recommendation to investigate department-level practices over generic training spend, and the narrative throughout — all of these are my independent professional judgement. Every numerical result is computed live in this document on the 103-row Yewande HR appraisal dataset (hr_team_survey_data- Yewande.xlsx, sheet Performance Appraisal).

Exploratory & Inferential Analytics for Employee Performance

EDA · Visualisation · Hypothesis Testing · Correlation · Regression

Yewande Amund — Head of Human Capital, inq Digital Nigeria Limited

today

Executive Summary

Professional Disclosure

Data Collection & Sampling

Data Description

Analytical Question

Analysis 1 — Exploratory Data Analysis (EDA)

Theory recap

Business justification

Code & output

Interpretation

Analysis 2 — Data Visualisation

Theory recap

Business justification

Code & output

Interpretation

Analysis 3 — Hypothesis Testing

Theory recap

Business justification

Code & output

Interpretation

Analysis 4 — Correlation Analysis

Theory recap

Business justification

Code & output

Interpretation

Analysis 5 — Regression

Theory recap

Business justification

Code & output

Interpretation

Integrated Findings

Limitations & Further Work

References

Appendix — AI Usage Statement