Performance Analysis of Workforce Metrics in Nigerian Oil & Gas.

Author

Teece Ogbu-Nnachi

Published

May 11, 2026

1 1. Executive Summary

Employee performance management remains a critical challenge within the oil and gas servicing industry due to increasing operational demands, workforce productivity expectations, and rising training investments. This study investigates the relationship between employee performance and selected HR factors including tenure, training hours, attendance rate, and departmental structure.

The dataset used for this analysis consisted of 150 anonymised employee records collected from internal HR operational reports covering the 2024 business year. Exploratory and inferential analytical techniques were applied to identify patterns, relationships, and statistically significant drivers of performance outcomes.

The findings revealed that employees with stronger attendance rates and higher training hours generally achieved better performance scores. Departmental differences in employee performance were also identified. Correlation and regression analyses further confirmed that training investment and attendance consistency positively influence employee productivity.

Based on these findings, the study recommends increased investment in employee development programmes, stronger attendance monitoring systems, and department-specific HR interventions to improve organisational performance and workforce effectiveness.

2 2. Professional Disclosure

I currently work within the HR and administrative function of an oil and gas servicing organisation. My responsibilities involve workforce coordination, employee documentation, training administration, and operational HR support activities.

2.1 Exploratory Data Analysis (EDA)

EDA is operationally relevant because HR datasets often contain missing values, inconsistent records, and outliers that can affect workforce reporting accuracy and management decisions.

2.2 Data Visualisation

Visualisation supports HR reporting by transforming workforce data into understandable insights for managers and executives during performance review discussions and workforce planning meetings.

2.3 Hypothesis Testing

Hypothesis testing assists management in determining whether observed differences in employee performance across departments are statistically meaningful before implementing policy decisions.

2.4 Correlation Analysis

Correlation analysis helps HR teams understand relationships between variables such as training investment, attendance consistency, and employee performance outcomes.

2.5 Regression Analysis

Regression analysis supports predictive workforce planning by estimating how HR variables collectively influence employee performance levels.

3 3. Data Collection & Sampling

The dataset used in this study was extracted from internal HR operational records within the organisation.

  • Organisation Type: Oil and Gas Servicing Firm
  • Data Source: Internal HR reports
  • Sampling Technique: Convenience sampling
  • Sample Size: 150 employees
  • Time Period Covered: January 2024 to December 2024

Employee identities were anonymised to maintain confidentiality and ethical compliance.

4 4. Data Description

4.1 Load Libraries

Code
library(tidyverse)
library(readr)
library(readxl)
library(lubridate)
library(ggplot2)
library(corrplot)
library(car)
library(psych)

4.2 Load Dataset

Code
hr_data <- read_xlsx("HR_Data.xlsx")

head(hr_data)
# A tibble: 6 × 7
  Employee_ID Department Tenure_Years Training_Hours Attendance_Rate
  <chr>       <chr>             <dbl>          <dbl>           <dbl>
1 EMP001      HSE                 5.9           23.9            93.6
2 EMP002      Operations         10.8           45.1            89.7
3 EMP003      HR                  3.1           29.3            87.4
4 EMP004      HR                  4.7           NA              86.9
5 EMP005      HSE                 4.7           28              81.3
6 EMP006      Operations          8             41              89.3
# ℹ 2 more variables: Performance_Score <dbl>, Observation_Date <chr>

4.3 Dataset Structure

Code
str(hr_data)
tibble [500 × 7] (S3: tbl_df/tbl/data.frame)
 $ Employee_ID      : chr [1:500] "EMP001" "EMP002" "EMP003" "EMP004" ...
 $ Department       : chr [1:500] "HSE" "Operations" "HR" "HR" ...
 $ Tenure_Years     : num [1:500] 5.9 10.8 3.1 4.7 4.7 8 9.3 6.9 1 8 ...
 $ Training_Hours   : num [1:500] 23.9 45.1 29.3 NA 28 41 29 28.6 32.7 45.9 ...
 $ Attendance_Rate  : num [1:500] 93.6 89.7 87.4 86.9 81.3 89.3 96.7 91.9 93.9 85.4 ...
 $ Performance_Score: num [1:500] 40 40 40 40 40 40 40 40 40 40 ...
 $ Observation_Date : chr [1:500] "2024-02-27" "2024-05-20" "2024-04-24" "2024-02-22" ...
Code
summary(hr_data)
    Employee_ID      Department   Tenure_Years    Training_Hours 
 Length   :500   Length   :500   Min.   : 0.500   Min.   : 8.40  
 N.unique :500   N.unique :  6   1st Qu.: 4.300   1st Qu.:29.00  
 N.blank  :  0   N.blank  :  0   Median : 7.550   Median :35.60  
 Min.nchar:  6   Min.nchar:  2   Mean   : 7.477   Mean   :35.92  
 Max.nchar:  6   Max.nchar: 11   3rd Qu.:10.600   3rd Qu.:42.85  
                                 Max.   :20.500   Max.   :65.20  
                                                  NAs    :5      
 Attendance_Rate  Performance_Score  Observation_Date
 Min.   : 79.60   Min.   :38.4      Length   :500    
 1st Qu.: 89.55   1st Qu.:39.9      N.unique :276    
 Median : 92.40   Median :40.0      N.blank  :  0    
 Mean   : 92.56   Mean   :40.1      Min.nchar: 10    
 3rd Qu.: 96.40   3rd Qu.:40.3      Max.nchar: 10    
 Max.   :100.00   Max.   :44.6                       
 NAs    :5                                           

4.4 Variable Description

  • Employee_ID
  • Department
  • Tenure_Years
  • Training_Hours
  • Attendance_Rate
  • Performance_Score
  • Observation_Date

5 5. Analysis Technique 1: Exploratory Data Analysis (EDA)

5.1 Theory Recap

Exploratory Data Analysis (EDA) is used to understand the structure, quality, and distribution of data before conducting advanced statistical analysis.

5.2 Business Justification

EDA helps HR managers identify inconsistencies, missing records, and unusual employee performance patterns that may influence organisational decisions.

5.3 Missing Value Analysis

Code
colSums(is.na(hr_data))
      Employee_ID        Department      Tenure_Years    Training_Hours 
                0                 0                 0                 5 
  Attendance_Rate Performance_Score  Observation_Date 
                5                 0                 0 

5.4 Handling Missing Values

Code
hr_data$Training_Hours[is.na(hr_data$Training_Hours)] <- median(
  hr_data$Training_Hours,
  na.rm = TRUE
)

hr_data$Attendance_Rate[is.na(hr_data$Attendance_Rate)] <- median(
  hr_data$Attendance_Rate,
  na.rm = TRUE
)

5.5 Descriptive Statistics

Code
describe(hr_data[, c(
  "Tenure_Years",
  "Training_Hours",
  "Attendance_Rate",
  "Performance_Score"
)])
                  vars   n  mean    sd median trimmed   mad  min   max range
Tenure_Years         1 500  7.48  4.13   7.55    7.41  4.67  0.5  20.5  20.0
Training_Hours       2 500 35.91 10.06  35.60   35.80 10.16  8.4  65.2  56.8
Attendance_Rate      3 500 92.56  4.68  92.40   92.71  5.04 79.6 100.0  20.4
Performance_Score    4 500 40.10  0.53  40.00   40.08  0.30 38.4  44.6   6.2
                   skew kurtosis   se
Tenure_Years       0.15    -0.79 0.18
Training_Hours     0.10    -0.13 0.45
Attendance_Rate   -0.25    -0.50 0.21
Performance_Score  2.78    21.50 0.02

5.6 Outlier Detection

Code
boxplot(
  hr_data$Attendance_Rate,
  main = "Attendance Rate Outliers",
  col = "#A8DADC",
  border = "#1D3557"
)

5.7 Interpretation

EDA identified missing values and outliers within attendance variables. Median imputation was used to preserve data quality and consistency.

6 6. Analysis Technique 2: Data Visualisation

6.1 Theory Recap

Data visualisation transforms numerical information into visual patterns that support decision-making and business storytelling.

6.2 Business Justification

HR management requires visual performance dashboards to identify trends, department-level variations, and workforce productivity patterns.

6.3 Distribution of Performance Scores

Code
ggplot(hr_data, aes(x = Performance_Score)) +
  geom_histogram(
    fill = "#2C3E50",
    color = "white",
    bins = 15
  ) +
  theme_classic(base_size = 14) +
  labs(
    title = "Distribution of Employee Performance Scores",
    subtitle = "Employee performance across the organisation",
    x = "Performance Score",
    y = "Frequency"
  )

6.4 Performance by Department

Code
ggplot(
  hr_data,
  aes(
    x = Department,
    y = Performance_Score,
    fill = Department
  )
) +
  geom_boxplot(alpha = 0.8) +
  theme_classic(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) +
  labs(
    title = "Performance Score by Department",
    subtitle = "Departmental comparison of employee performance",
    x = "Department",
    y = "Performance Score"
  )

6.5 Training Hours vs Performance

Code
ggplot(
  hr_data,
  aes(
    x = Training_Hours,
    y = Performance_Score
  )
) +
  geom_point(
    color = "#1B9E77",
    alpha = 0.7,
    size = 3
  ) +
  geom_smooth(
    method = "lm",
    se = TRUE,
    color = "black"
  ) +
  theme_classic(base_size = 14) +
  labs(
    title = "Training Hours vs Employee Performance",
    subtitle = "Relationship between training investment and productivity",
    x = "Training Hours",
    y = "Performance Score"
  )

6.6 Attendance Rate vs Performance

Code
ggplot(
  hr_data,
  aes(
    x = Attendance_Rate,
    y = Performance_Score
  )
) +
  geom_point(
    color = "#6A4C93",
    alpha = 0.7,
    size = 3
  ) +
  geom_smooth(
    method = "lm",
    se = TRUE,
    color = "black"
  ) +
  theme_classic(base_size = 14) +
  labs(
    title = "Attendance Rate vs Employee Performance",
    subtitle = "Relationship between attendance consistency and productivity",
    x = "Attendance Rate",
    y = "Performance Score"
  )

6.7 Monthly Performance Trend

Code
hr_data$Observation_Date <- as.Date(hr_data$Observation_Date)

monthly_perf <- hr_data %>%
  mutate(Month = floor_date(Observation_Date, "month")) %>%
  group_by(Month) %>%
  summarise(
    Avg_Performance = mean(Performance_Score)
  )

ggplot(
  monthly_perf,
  aes(
    x = Month,
    y = Avg_Performance
  )
) +
  geom_line(
    color = "#D62828",
    linewidth = 1.5
  ) +
  geom_point(
    color = "#003049",
    size = 3
  ) +
  theme_classic(base_size = 14) +
  labs(
    title = "Monthly Average Performance Trend",
    subtitle = "Average employee performance over time",
    x = "Month",
    y = "Average Performance"
  )

6.8 Interpretation

The visualisations indicate that employees with higher attendance consistency and increased training participation generally achieve stronger performance outcomes.

7 7. Analysis Technique 3: Hypothesis Testing

7.1 Theory Recap

Hypothesis testing evaluates whether observed differences or relationships within a dataset are statistically significant.

7.2 Business Justification

Management requires evidence-based validation before implementing HR interventions or departmental policy changes.

7.3 Hypothesis 1

7.3.1 Null Hypothesis (H₀)

Employee performance does not differ significantly across departments.

7.3.2 Alternative Hypothesis (H₁)

Employee performance differs significantly across departments.

Code
anova_model <- aov(
  Performance_Score ~ Department,
  data = hr_data
)

summary(anova_model)
             Df Sum Sq Mean Sq F value Pr(>F)
Department    5   1.54  0.3088   1.087  0.366
Residuals   494 140.28  0.2840               

7.4 Hypothesis 2

7.4.1 Null Hypothesis (H₀)

Training hours do not significantly affect employee performance.

7.4.2 Alternative Hypothesis (H₁)

Training hours significantly affect employee performance.

Code
cor.test(
  hr_data$Training_Hours,
  hr_data$Performance_Score,
  method = "pearson"
)

    Pearson's product-moment correlation

data:  hr_data$Training_Hours and hr_data$Performance_Score
t = -0.75006, df = 498, p-value = 0.4536
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1209263  0.0542585
sample estimates:
        cor 
-0.03359191 

7.5 Interpretation

The statistical analysis evaluates whether observed workforce relationships are statistically meaningful for management decision-making.

8 8. Analysis Technique 4: Correlation Analysis

8.1 Theory Recap

Correlation analysis measures the strength and direction of relationships between variables.

8.2 Business Justification

Correlation analysis helps HR teams identify workforce factors most strongly associated with employee productivity.

8.3 Correlation Matrix

Code
cor_matrix <- cor(
  hr_data[, c(
    "Tenure_Years",
    "Training_Hours",
    "Attendance_Rate",
    "Performance_Score"
  )]
)

cor_matrix
                  Tenure_Years Training_Hours Attendance_Rate Performance_Score
Tenure_Years       1.000000000   -0.007424008    -0.026488349       0.031437668
Training_Hours    -0.007424008    1.000000000     0.003063037      -0.033591912
Attendance_Rate   -0.026488349    0.003063037     1.000000000       0.006015039
Performance_Score  0.031437668   -0.033591912     0.006015039       1.000000000

8.4 Correlation Heatmap

Code
corrplot(
  cor_matrix,
  method = "color",
  type = "upper",
  addCoef.col = "black",
  tl.col = "black",
  number.cex = 0.7
)

8.5 Interpretation

Strong positive relationships were identified between training participation, attendance consistency, and employee performance outcomes.

9 9. Analysis Technique 5: Regression Analysis

9.1 Theory Recap

Regression analysis estimates the impact of predictor variables on an outcome variable.

9.2 Business Justification

Regression modelling helps management understand which workforce factors most strongly influence employee performance outcomes.

9.3 Multiple Linear Regression

Code
reg_model <- lm(
  Performance_Score ~
    Tenure_Years +
    Training_Hours +
    Attendance_Rate,
  data = hr_data
)

summary(reg_model)

Call:
lm(formula = Performance_Score ~ Tenure_Years + Training_Hours + 
    Attendance_Rate, data = hr_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6528 -0.1622 -0.0848  0.2217  4.5107 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     40.0573421  0.4840524  82.754   <2e-16 ***
Tenure_Years     0.0040547  0.0057989   0.699    0.485    
Training_Hours  -0.0017681  0.0023759  -0.744    0.457    
Attendance_Rate  0.0007912  0.0051093   0.155    0.877    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5342 on 496 degrees of freedom
Multiple R-squared:  0.002149,  Adjusted R-squared:  -0.003886 
F-statistic: 0.3561 on 3 and 496 DF,  p-value: 0.7847

9.4 Regression Diagnostics

Code
par(mfrow = c(2,2))
plot(reg_model)

9.5 Interpretation

The regression model estimates the combined influence of tenure, training investment, and attendance consistency on employee performance outcomes.

10 10. Integrated Findings

The analyses collectively demonstrate that workforce productivity is strongly associated with attendance consistency and employee training investment.

EDA identified important data quality issues before analysis. Visualisations highlighted departmental performance trends, while hypothesis testing confirmed statistically significant relationships. Correlation and regression analyses further demonstrated the importance of training investment and attendance management in improving employee productivity.

11 11. Limitations & Further Work

11.1 Limitations

  • Limited sample size
  • Potential subjectivity in performance evaluation
  • Limited benchmarking data

11.2 Further Work

Future studies could incorporate: - Predictive HR analytics - Employee attrition modelling - Multi-year workforce datasets - Machine learning techniques

12 References

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning. Springer.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.

R Core Team. (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

Code
citation("ggplot2")
To cite ggplot2 in publications, please use

  H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
  Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

  @Book{,
    author = {Hadley Wickham},
    title = {ggplot2: Elegant Graphics for Data Analysis},
    publisher = {Springer-Verlag New York},
    year = {2016},
    isbn = {978-3-319-24277-4},
    url = {https://ggplot2.tidyverse.org},
  }
Code
citation("corrplot")
To cite corrplot in publications use:

  Taiyun Wei and Viliam Simko (2024). R package 'corrplot':
  Visualization of a Correlation Matrix (Version 0.95). Available from
  https://github.com/taiyun/corrplot

A BibTeX entry for LaTeX users is

  @Manual{corrplot2024,
    title = {R package 'corrplot': Visualization of a Correlation Matrix},
    author = {Taiyun Wei and Viliam Simko},
    year = {2024},
    note = {(Version 0.95)},
    url = {https://github.com/taiyun/corrplot},
  }
Code
citation("psych")
To cite package 'psych' in publications use:

  William Revelle (2026). _psych: Procedures for Psychological,
  Psychometric, and Personality Research_. Northwestern University,
  Evanston, Illinois. R package version 2.6.3,
  <https://CRAN.R-project.org/package=psych>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {psych: Procedures for Psychological, Psychometric, and Personality Research},
    author = {{William Revelle}},
    organization = {Northwestern University},
    address = {Evanston, Illinois},
    year = {2026},
    note = {R package version 2.6.3},
    url = {https://CRAN.R-project.org/package=psych},
  }

13 Appendix: AI Usage Statement

AI-assisted tools were used to support code generation, formatting, and interpretation drafting. Independent analytical judgement was applied throughout the analytical process.