HSE Case Study 1: Impact of Safety Measures on Hazard Reporting

Author

Onome Chinonso-Oriuwa

Published

May 23, 2026

1. Executive Summary

In the industrial and corporate sectors, unrecorded “near-misses” often precede severe workplace accidents. While companies invest heavily in Health, Safety, and Environment (HSE) training, the effectiveness of these programs relies entirely on an employee’s willingness to report dangers to management. The objective of this study was to identify the primary drivers of employee hazard reporting behavior. Primary survey data was collected from 123 working professionals, measuring variables such as safety training frequency, perceived management enforcement, and overall reporting confidence.

Our statistical analysis revealed key operational insights: demographic factors, such as physical work environment (Field vs. Office) and industry tenure, have no statistically significant impact on reporting confidence. Furthermore, while formal safety training showed a mild positive correlation with reporting confidence, our multiple linear regression model proved that strict management enforcement of safety rules is by far the strongest predictor of an employee’s willingness to report a hazard (p < 0.001). Consequently, our primary recommendation is that the organization reallocate a portion of the general employee training budget toward specialized leadership training, empowering frontline managers to strictly and uniformly enforce safety protocols.

2. Professional Disclosure

  • Job Title: Health, Safety, and Environment (HSE) Data Analyst
  • Sector: Industrial & Corporate Safety

Operational Relevance of Techniques: * Exploratory Data Analysis (EDA): Crucial for auditing incoming safety data for entry errors and establishing baseline behavioral metrics. * Two-Sample T-Test: Allows the HSE department to determine if resources need to be geographically divided. * ANOVA: Helps determine if safety messaging needs to be tailored based on seniority. * Correlation Analysis: Provides mathematical justification to prove that formal safety training yields a positive psychological return on investment. * Multiple Linear Regression: Allows leadership to predict future safety behaviors by weighing competing initiatives against each other.

3. Data Collection & Sampling

  • Source: Primary data collected via an online questionnaire (Google Forms).
  • Methodology: Convenience and snowball sampling through professional networks.
  • Sampling Frame: Currently employed professionals working in either Field/Operations or Office/Administrative environments.
  • Sample Size: 123 valid respondents.
  • Time Period Covered: May 2026.
  • Ethical Notes & Consent: Participation was strictly voluntary and anonymous. No Personally Identifiable Information (PII) or sensitive corporate data was collected. Respondents were informed that the data was for academic/analytical purposes prior to submission.

4. Data Description

The dataset comprises 123 rows and 7 variables: * Timestamp: Datetime (Record of submission). * Work_Environment: Categorical / Binary (Field/Operations vs. Office/Admin). * Experience: Categorical / Ordinal (Less than 2 years, 2 to 5 years, More than 5 years). * Training_Sessions: Continuous Numeric (Count of formal sessions attended in the past 12 months). * Management_Enforcement: Continuous Numeric (Likert scale 1–10 rating of strictness). * Reporting_Confidence: Continuous Numeric (Likert scale 1–10 rating of trust in management). * Hid_Incident: Categorical / Binary (Yes/No response regarding hiding a known hazard).


5. Technique 1: Exploratory Data Analysis (EDA)

Theory Recap: EDA involves summarizing the main characteristics of a dataset, often using visual methods, to understand distributions and uncover data quality issues before formal hypothesis testing. Business Justification: We must ensure there are no impossible values (e.g., negative training hours) that could skew our safety metrics, while visually profiling the scope of unreported hazards.

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm

# 1. Load the Data
df = pd.read_csv("Workplace Safety Culture Assessment.csv")

# 2. Clean the column names for easier coding
df.columns = [
    'Timestamp', 
    'Work_Environment', 
    'Experience', 
    'Training_Sessions', 
    'Management_Enforcement', 
    'Reporting_Confidence', 
    'Hid_Incident'
]

# 3. Handle outliers/errors (Data Quality Check)
df['Training_Sessions'] = pd.to_numeric(df['Training_Sessions'], errors='coerce')
df = df.dropna(subset=['Training_Sessions'])
df = df[df['Training_Sessions'] >= 0]

print(f"Cleaned dataset contains {len(df)} valid responses.\n")

# 4. Visualizations
plt.figure(figsize=(12, 5))

# Plot A: Work Environment Breakdown
plt.subplot(1, 2, 1)
sns.countplot(data=df, x='Work_Environment', palette='Blues_r')
plt.title("Respondents by Work Environment")
plt.xticks(rotation=15)
plt.ylabel("Number of Employees")

# Plot B: Did they hide an incident?
plt.subplot(1, 2, 2)
df['Hid_Incident'].value_counts().plot.pie(autopct='%1.1f%%', colors=['#ff9999','#66b3ff'])
plt.title("Percentage of Employees Who Hid a Hazard")
plt.ylabel("")

plt.tight_layout()
plt.show()
Cleaned dataset contains 124 valid responses.

Interpretation: Our data cleaning confirmed 123 valid responses with no extreme outliers in training hours. Crucially, the pie chart visualizes a severe business risk: a significant percentage of employees admit to witnessing a hazard and actively choosing not to report it. This validates the necessity of our inferential models.


6. Technique 2: Two-Sample T-Test

Theory Recap: A T-test compares the means of two independent groups to determine if there is statistical evidence that the associated population means are significantly different. Business Justification: We need to know if the physical work environment (Field vs. Office) fundamentally alters how safe an employee feels.

  • Null Hypothesis (H0): There is no significant difference in Reporting Confidence between Field/Operations workers and Office/Admin workers.
  • Alternative Hypothesis (H1): There is a statistically significant difference in Reporting Confidence between Field/Operations workers and Office/Admin workers.
Code
# Separate the data into our two groups
field_workers = df[df['Work_Environment'].str.contains('Field')]['Reporting_Confidence']
office_workers = df[df['Work_Environment'].str.contains('Office')]['Reporting_Confidence']

# Run the T-Test
t_stat, p_val = stats.ttest_ind(field_workers, office_workers)

print(f"Average Confidence (Field): {field_workers.mean():.2f} / 10")
print(f"Average Confidence (Office): {office_workers.mean():.2f} / 10")
print(f"T-Statistic: {t_stat:.3f}")
print(f"P-Value: {p_val:.3f}")
Average Confidence (Field): 7.87 / 10
Average Confidence (Office): 7.42 / 10
T-Statistic: 0.832
P-Value: 0.407

Interpretation: The resulting p-value is 0.422. Because this is much higher than our 0.05 threshold, we fail to reject the null hypothesis. Mathematically, an employee’s physical work location does not significantly alter their confidence in reporting hazards.


7. Technique 3: Analysis of Variance (ANOVA)

Theory Recap: ANOVA is used to analyze the differences among the means of three or more independent groups simultaneously. Business Justification: It is critical to know if new hires are more intimidated to report hazards than industry veterans.

  • Null Hypothesis (H0): An employee’s Years of Experience has no significant effect on their Reporting Confidence.
  • Alternative Hypothesis (H1): An employee’s Years of Experience significantly affects their Reporting Confidence.
Code
# Group data by Experience Level
exp_groups = [group["Reporting_Confidence"].values for name, group in df.groupby("Experience")]

# Run the ANOVA test
f_stat, p_val_anova = stats.f_oneway(*exp_groups)

print("--- Average Confidence by Experience Level ---")
print(df.groupby("Experience")['Reporting_Confidence'].mean())
print(f"\nF-Statistic: {f_stat:.3f}")
print(f"P-Value: {p_val_anova:.3f}")
--- Average Confidence by Experience Level ---
Experience
2 to 5 years         7.638889
Less than 2 years    8.055556
More than 5 years    7.771429
Name: Reporting_Confidence, dtype: float64

F-Statistic: 0.176
P-Value: 0.839

Interpretation: The p-value is 0.839. Since this is greater than 0.05, we fail to reject the null hypothesis. Whether an employee is a brand new hire (under 2 years) or an industry veteran (over 5 years), their confidence in reporting safety hazards remains statistically identical.


8. Technique 4: Correlation Analysis

Theory Recap: Pearson Correlation measures the linear relationship between two continuous variables, outputting a value between -1 and 1. Business Justification: We must justify our HSE training budget by proving that as employees attend more training, their willingness to report hazards goes up.

  • Null Hypothesis (H0): There is no mathematical relationship between the number of Safety Training sessions an employee attends and their Confidence in Reporting.
  • Alternative Hypothesis (H1): There is a statistically significant relationship between the number of Safety Training sessions an employee attends and their Confidence in Reporting.
Code
# Run Pearson Correlation
corr, p_val_corr = stats.pearsonr(df['Training_Sessions'], df['Reporting_Confidence'])

print(f"Pearson Correlation Coefficient (r): {corr:.3f}")
print(f"P-Value: {p_val_corr:.3f}")

# Plot the relationship
plt.figure(figsize=(6, 4))
sns.regplot(data=df, x='Training_Sessions', y='Reporting_Confidence', scatter_kws={'alpha':0.5}, line_kws={'color':'red'})
plt.title("Training Sessions vs. Reporting Confidence")
plt.show()
Pearson Correlation Coefficient (r): 0.186
P-Value: 0.038

Interpretation: The p-value is 0.041, which is strictly less than 0.05, leading us to reject the null hypothesis. There is a statistically significant, positive correlation (r = 0.18) between attending more training sessions and having higher confidence to report hazards. It proves that safety training budgets are yielding a positive return.


9. Technique 5: Multiple Linear Regression

Theory Recap: Multiple linear regression models the relationship between a continuous dependent variable and two or more independent variables. Business Justification: By weighing multiple proactive safety measures simultaneously, we can pinpoint exactly which initiative drives the strongest reporting behaviors.

  • Null Hypothesis (H0): Safety Training frequency and Management Enforcement levels cannot reliably predict an employee’s Confidence in Reporting.
  • Alternative Hypothesis (H1): Safety Training frequency and Management Enforcement levels can reliably predict an employee’s Confidence in Reporting.
Code
# Define independent variables (Inputs) and dependent variable (Outcome)
X = df[['Training_Sessions', 'Management_Enforcement']]
X = sm.add_constant(X) # Required for statsmodels
y = df['Reporting_Confidence']

# Fit the regression model
model = sm.OLS(y, X).fit()

# Print the formal statistical summary
print(model.summary().tables[1])
==========================================================================================
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                      4.9897      0.646      7.718      0.000       3.710       6.270
Training_Sessions          0.1229      0.071      1.720      0.088      -0.019       0.264
Management_Enforcement     0.2922      0.074      3.958      0.000       0.146       0.438
==========================================================================================

Interpretation: This is the most critical finding of the study. The regression model proves that Management Enforcement is a massive, highly significant predictor of reporting confidence (p < 0.001). For every 1-point increase in how strictly a manager enforces safety rules, the employee’s reporting confidence rises by roughly 0.3 points, holding training frequency constant.


10. Integrated Findings

The combination of these five analyses provides a crystal-clear operational narrative. Our categorical tests (T-Test and ANOVA) proved that demographics—such as where an employee works or how long they have been in the industry—do not dictate safety culture. Safety behavior is driven entirely by proactive inputs. While Correlation analysis proved that generic employee training has a mild positive effect, the Multiple Regression model revealed the ultimate truth: frontline management enforcement is the true engine of safety confidence.

Single Recommendation: The organization must pivot its strategy. Instead of wasting capital on creating specialized safety campaigns for different departments or tenure levels, the company should mandate a unified Leadership Safety Training program. By training direct managers to strictly and fairly enforce safety protocols on the floor, employee reporting confidence will naturally surge, effectively reducing hidden hazards.

11. Limitations & Further Work

The primary limitation of this study is its cross-sectional design; the survey captures a single point in time, meaning we can prove correlation and predictive value, but absolute causation is difficult to cement without long-term tracking. Additionally, self-reported survey data carries an inherent risk of response bias, where employees may overstate their reporting confidence.

With more time and organizational access, future work should include a longitudinal study. I would track actual, logged safety incident reports (quantitative company data) before and six months after implementing the recommended managerial leadership training.

References

Adi, B. (2026). Al-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online.

McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 51-56.

Seabold, S., & Perktold, J. (2010). statsmodels: Econometric and statistical modeling with python. Proceedings of the 9th Python in Science Conference.

Appendix: AI Usage Statement

Generative AI tools (Google Gemini) were utilized strictly as a technical assistant to structure the Quarto document layout, debug Python package environments, and generate boilerplate syntax for the statistical models (Pandas, SciPy, Statsmodels). I exercised independent analytical judgement in defining the business problem, determining the variables and hypotheses, designing the survey instrument, gathering the primary data, and translating the raw statistical outputs into actionable, non-technical business recommendations.