An Exploratory and Inferential Analysis of Factors Affecting Procurement Delivery Performance in an Oil and Gas Servicing Company

Author

Aizehinomon Oniha

Published

May 5, 2026

1. Data Loading and Cleaning

Code

library(readxl)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

Code

library(skimr)
library(corrplot)

corrplot 0.95 loaded

Code

library(broom)
library(effectsize)

data <- read_excel("Data for DA Exam.xlsx")

New names:
• `` -> `...1`

Code

data_clean <- data %>%
  clean_names() %>%
  rename(
    quantity = quantiy,
    order_value = revenue,
    delivery_status = delievry_status_on_time_late
  ) %>%
  mutate(
    po_date = as.Date(po_date),
    item_category = as.factor(item_category),
    vendor = as.factor(vendor),
    source = as.factor(source),
    delivery_status = as.factor(delivery_status),
    quantity = as.numeric(quantity),
    unit_price = as.numeric(unit_price),
    order_value = as.numeric(order_value),
    planned_delivery_days = as.numeric(planned_delivery_days)
  )

glimpse(data_clean)

Rows: 100
Columns: 11
$ x1                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
$ po_date               <date> 2025-12-12, 2025-12-12, 2025-07-15, 2025-07-15,…
$ product_item_id       <chr> "Towel (Bath size)", "Towel (Hand towel)", "Sewi…
$ item_category         <fct> Textile/Linen, Textile/Linen, Skill Equipment, A…
$ vendor                <fct> "Idumota", "Idumota", "Idumota", "Idumota", "Idu…
$ source                <fct> Local Supplier, Local Supplier, Local Supplier, …
$ quantity              <dbl> 3500, 3500, 100, 3, 2, 5, 4, 9, 8, 5, 5, 15, 1, …
$ unit_price            <dbl> 4500, 2500, 64700, 170000, 168000, 168500, 26000…
$ order_value           <dbl> 15750000, 8750000, 6470000, 510000, 336000, 8425…
$ planned_delivery_days <dbl> 5, 5, 4, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 21, 21…
$ delivery_status       <fct> Late, Late, On-time, On-time, On-time, On-time, …

2. Exploratory Data Analysis

Exploratory Data Analysis was used to understand the structure and quality of the procurement dataset. This is important because procurement managers need to understand order size, order value, delivery timelines, and delivery status before making supplier or sourcing decisions.

Code

dim(data_clean)

[1] 100  11

Code

summary(data_clean)

       x1            po_date           product_item_id   
 Min.   :  1.00   Min.   :2021-01-29   Length:100        
 1st Qu.: 25.75   1st Qu.:2024-09-19   Class :character  
 Median : 50.50   Median :2024-12-20   Mode  :character  
 Mean   : 50.50   Mean   :2025-03-08                     
 3rd Qu.: 75.25   3rd Qu.:2025-10-11                     
 Max.   :100.00   Max.   :2026-02-09                     
                                                         
              item_category              vendor                      source  
 Medical Equipments  :38    Idumota         :26   International Supplier:39  
 ICT                 :15    Computer Village:11   Local Supplier        :61  
 Appliances          :11    Turkey          : 8                              
 Furniture & Fixtures:11    Alaba           : 7                              
 Textile/Linen       : 8    Idumagbo 1      : 7                              
 HVAC                : 5    US              : 7                              
 (Other)             :12    (Other)         :34                              
    quantity        unit_price        order_value        planned_delivery_days
 Min.   :   1.0   Min.   :    2500   Min.   :    17980   Min.   : 2.00        
 1st Qu.:   2.0   1st Qu.:   86045   1st Qu.:   415899   1st Qu.: 3.00        
 Median :   7.0   Median :  277128   Median :  3861148   Median : 5.00        
 Mean   : 251.1   Mean   : 2403641   Mean   : 13814207   Mean   :10.54        
 3rd Qu.:  15.0   3rd Qu.:  875900   3rd Qu.:  9805851   3rd Qu.:21.00        
 Max.   :4500.0   Max.   :70391117   Max.   :136436170   Max.   :21.00        
                                                                              
 delivery_status
 Late   :28     
 On-time:72

Code

colSums(is.na(data_clean))

                   x1               po_date       product_item_id 
                    0                     0                     0 
        item_category                vendor                source 
                    0                     0                     0 
             quantity            unit_price           order_value 
                    0                     0                     0 
planned_delivery_days       delivery_status 
                    0                     0

Code

data_clean %>%
  select(quantity, unit_price, order_value, planned_delivery_days) %>%
  skim()

Data summary
Name	Piped data
Number of rows	100
Number of columns	4
_______________________
Column type frequency:
numeric	4
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
quantity	1	251.13	863.43	1	2.00	7.0	15	4500	▇▁▁▁▁
unit_price	1	2403641.43	7922023.52	2500	86045.05	277127.7	875900	70391117	▇▁▁▁▁
order_value	1	13814207.45	25441459.64	17980	415898.94	3861148.1	9805851	136436170	▇▁▁▁▁
planned_delivery_days	1	10.54	8.62	2	3.00	5.0	21	21	▇▁▁▁▅

Code

data_clean %>%
  count(delivery_status)

# A tibble: 2 × 2
  delivery_status     n
  <fct>           <int>
1 Late               28
2 On-time            72

Code

data_clean %>%
  count(item_category, sort = TRUE)

# A tibble: 9 × 2
  item_category            n
  <fct>                <int>
1 Medical Equipments      38
2 ICT                     15
3 Appliances              11
4 Furniture & Fixtures    11
5 Textile/Linen            8
6 HVAC                     5
7 Skill Equipment          5
8 Industrial Parts         4
9 Packaging                3

3. Data Visualisation

The visualisations below show the delivery performance story in the procurement data. The charts focus on delivery status, procurement source, item category, planned delivery days, and order value.

Code

ggplot(data_clean, aes(x = delivery_status)) +
  geom_bar() +
  labs(
    title = "Distribution of Procurement Delivery Status",
    x = "Delivery Status",
    y = "Number of Purchase Orders"
  )

Code

ggplot(data_clean, aes(x = source, fill = delivery_status)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Delivery Status by Procurement Source",
    x = "Source",
    y = "Number of Purchase Orders",
    fill = "Delivery Status"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Code

ggplot(data_clean, aes(x = item_category, fill = delivery_status)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Delivery Status by Item Category",
    x = "Item Category",
    y = "Number of Purchase Orders",
    fill = "Delivery Status"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Code

ggplot(data_clean, aes(x = delivery_status, y = planned_delivery_days)) +
  geom_boxplot() +
  labs(
    title = "Planned Delivery Days by Delivery Status",
    x = "Delivery Status",
    y = "Planned Delivery Days"
  )

Code

ggplot(data_clean, aes(x = delivery_status, y = order_value)) +
  geom_boxplot() +
  labs(
    title = "Order Value by Delivery Status",
    x = "Delivery Status",
    y = "Order Value"
  )

4. Hypothesis Testing

Hypothesis Test 1: Procurement Source and Delivery Status

H0: Procurement source and delivery status are independent.

H1: Procurement source and delivery status are associated.

This test checks whether delivery performance differs by procurement source.

Code

source_delivery_table <- table(data_clean$source, data_clean$delivery_status)

source_delivery_table

                        
                         Late On-time
  International Supplier   18      21
  Local Supplier           10      51

Code

chisq.test(source_delivery_table)


    Pearson's Chi-squared test with Yates' continuity correction

data:  source_delivery_table
X-squared = 9.0275, df = 1, p-value = 0.00266

Code

cramers_v(source_delivery_table)

Cramer's V (adj.) |       95% CI
--------------------------------
0.31              | [0.12, 1.00]

- One-sided CIs: upper bound fixed at [1.00].

Business interpretation: If the p-value is less than 0.05, it means procurement source is significantly associated with delivery status. This would suggest that some sources may require closer follow-up, stronger supplier monitoring, or longer planning lead time.

Hypothesis Test 2: Planned Delivery Days and Delivery Status

H0: The average planned delivery days are the same for on-time and late deliveries.

H1: The average planned delivery days are different for on-time and late deliveries.

Code

t.test(planned_delivery_days ~ delivery_status, data = data_clean)


    Welch Two Sample t-test

data:  planned_delivery_days by delivery_status
t = 4.0345, df = 50.441, p-value = 0.0001857
alternative hypothesis: true difference in means between group Late and group On-time is not equal to 0
95 percent confidence interval:
  3.584557 10.689252
sample estimates:
   mean in group Late mean in group On-time 
            15.678571              8.541667

Code

cohens_d(planned_delivery_days ~ delivery_status, data = data_clean)

Cohen's d |       95% CI
------------------------
0.89      | [0.43, 1.34]

- Estimated using pooled SD.

Business interpretation: If the p-value is less than 0.05, it means there is a statistically significant difference in planned delivery days between on-time and late orders. This would suggest that delivery planning is an important factor in procurement delivery performance. # 5. Correlation Analysis

Correlation analysis was used to examine the relationship among numeric procurement variables such as quantity, unit price, order value, and planned delivery days.

Code

numeric_data <- data_clean %>%
  select(quantity, unit_price, order_value, planned_delivery_days)

correlation_matrix <- cor(numeric_data, use = "complete.obs")

correlation_matrix

                         quantity  unit_price order_value planned_delivery_days
quantity               1.00000000 -0.08709929   0.0787973            -0.1108786
unit_price            -0.08709929  1.00000000   0.4884794             0.2366074
order_value            0.07879730  0.48847941   1.0000000             0.4727980
planned_delivery_days -0.11087855  0.23660744   0.4727980             1.0000000

Code

corrplot(
  correlation_matrix,
  method = "color",
  type = "upper",
  addCoef.col = "black",
  tl.cex = 0.8,
  number.cex = 0.7
)

Business interpretation: A positive correlation means that two variables move in the same direction. For example, if quantity and order value are strongly positively correlated, it means larger procurement quantities are linked with higher order values. This is expected in procurement because buying more units usually increases the total purchase order value.

6. Logistic Regression Analysis

Logistic regression was used because the main outcome variable, delivery status, has two outcomes: On-time and Late. The model estimates the likelihood of late delivery based on procurement factors.

Code

data_clean <- data_clean %>%
  mutate(
    late_delivery = ifelse(delivery_status == "Late", 1, 0)
  )

table(data_clean$late_delivery)


 0  1 
72 28

Code

logistic_model <- glm(
  late_delivery ~ quantity + unit_price + order_value + planned_delivery_days + source + item_category,
  data = data_clean,
  family = binomial
)

summary(logistic_model)


Call:
glm(formula = late_delivery ~ quantity + unit_price + order_value + 
    planned_delivery_days + source + item_category, family = binomial, 
    data = data_clean)

Coefficients:
                                    Estimate Std. Error z value Pr(>|z|)
(Intercept)                       -1.617e+01  1.563e+01  -1.035    0.301
quantity                          -3.169e-04  5.297e-04  -0.598    0.550
unit_price                         7.766e-08  8.277e-08   0.938    0.348
order_value                        2.805e-08  1.734e-08   1.617    0.106
planned_delivery_days              7.617e-01  7.721e-01   0.987    0.324
sourceLocal Supplier               1.212e+01  1.327e+01   0.913    0.361
item_categoryFurniture & Fixtures -1.843e+01  3.223e+03  -0.006    0.995
item_categoryHVAC                 -1.955e+01  4.801e+03  -0.004    0.997
item_categoryICT                  -1.796e+01  2.744e+03  -0.007    0.995
item_categoryIndustrial Parts     -1.941e+01  5.377e+03  -0.004    0.997
item_categoryMedical Equipments   -1.002e+00  1.391e+00  -0.720    0.471
item_categoryPackaging             2.041e+01  6.187e+03   0.003    0.997
item_categorySkill Equipment      -1.841e+01  4.656e+03  -0.004    0.997
item_categoryTextile/Linen         1.616e+00  1.808e+00   0.894    0.371

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 118.591  on 99  degrees of freedom
Residual deviance:  55.825  on 86  degrees of freedom
AIC: 83.825

Number of Fisher Scoring iterations: 18

Code

tidy(logistic_model, exponentiate = TRUE, conf.int = TRUE)

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: algorithm did not converge

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: algorithm did not converge

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

# A tibble: 14 × 7
   term              estimate std.error statistic p.value    conf.low  conf.high
   <chr>                <dbl>     <dbl>     <dbl>   <dbl>       <dbl>      <dbl>
 1 (Intercept)       9.48 e-8   1.56e+1  -1.03      0.301  9.89 e- 24  1.93e-  1
 2 quantity          1.000e+0   5.30e-4  -0.598     0.550  9.99 e-  1  1.00e+  0
 3 unit_price        1.00 e+0   8.28e-8   0.938     0.348  1.000e+  0  1.00e+  0
 4 order_value       1.00 e+0   1.73e-8   1.62      0.106  1.000e+  0  1.00e+  0
 5 planned_delivery… 2.14 e+0   7.72e-1   0.987     0.324  1.04 e+  0  1.29e+  1
 6 sourceLocal Supp… 1.83 e+5   1.33e+1   0.913     0.361  8.57 e-  1  4.03e+ 18
 7 item_categoryFur… 9.87 e-9   3.22e+3  -0.00572   0.995 NA           3.01e+ 76
 8 item_categoryHVAC 3.24 e-9   4.80e+3  -0.00407   0.997 NA           2.13e+123
 9 item_categoryICT  1.59 e-8   2.74e+3  -0.00655   0.995 NA           3.11e+ 69
10 item_categoryInd… 3.71 e-9   5.38e+3  -0.00361   0.997 NA           1.82e+147
11 item_categoryMed… 3.67 e-1   1.39e+0  -0.720     0.471  1.58 e-  2  5.39e+  0
12 item_categoryPac… 7.29 e+8   6.19e+3   0.00330   0.997  6.83 e-142 NA        
13 item_categorySki… 1.02 e-8   4.66e+3  -0.00395   0.997 NA           1.12e+114
14 item_categoryTex… 5.04 e+0   1.81e+0   0.894     0.371  1.46 e-  1  2.93e+  2

Code

data_clean$predicted_probability <- predict(logistic_model, type = "response")

ggplot(data_clean, aes(x = predicted_probability, fill = delivery_status)) +
  geom_histogram(position = "identity", alpha = 0.6, bins = 20) +
  labs(
    title = "Predicted Probability of Late Delivery",
    x = "Predicted Probability of Late Delivery",
    y = "Number of Purchase Orders",
    fill = "Delivery Status"
  )

Business interpretation: The logistic regression estimates which procurement factors increase or reduce the likelihood of late delivery. An odds ratio above 1 suggests a higher likelihood of late delivery, while an odds ratio below 1 suggests a lower likelihood of late delivery. # 7. Integrated Findings

The five analyses provide a combined view of procurement delivery performance at Petrohawk Centrum Limited. The exploratory analysis shows the structure of the procurement data and highlights the main variables affecting delivery performance. The visualisations show delivery patterns across source, item category, order value, and planned delivery days.

The hypothesis tests provide statistical evidence on whether delivery status differs by procurement source and planned delivery timeline. The correlation analysis shows how procurement cost and order-size variables relate to each other. The logistic regression brings the analysis together by estimating how selected procurement variables influence the likelihood of late delivery.

Overall, the findings support a data-driven supplier and category monitoring system. Petrohawk should pay closer attention to item categories, sources, and order characteristics that show higher risk of delay. # 8. Recommendation

Petrohawk Centrum Limited should implement a procurement delivery performance dashboard that tracks delivery status by vendor, source, item category, order value, and planned delivery days.

Suppliers and sources with repeated late deliveries should be flagged for management review. The procurement team should also introduce lead-time risk classification. High-value or delay-prone orders should receive earlier follow-up, stronger supplier confirmation, and alternative sourcing options where possible.

Management should use the findings from the analysis to support supplier evaluation, procurement planning, and delivery follow-up decisions. This will improve delivery reliability, reduce operational disruption, and strengthen client satisfaction. # 9. Limitations and Further Work

The analysis is limited by the available variables in the procurement dataset. The data does not include actual delivery date, supplier capacity, payment timing, logistics disruptions, or client urgency level. These additional variables could improve the strength of future analysis.

Future work should include more procurement periods, actual delivery duration, supplier rating, logistics method, and reason for delay. This would allow Petrohawk Centrum Limited to build stronger predictive models and improve procurement risk management. # 10. Appendix: AI Usage Statement

AI tools were used to support coding structure, explanation of R commands, and organisation of the Quarto report. The dataset, business context, analytical judgement, interpretation of results, and final recommendations were reviewed and adapted independently by the author based on procurement experience at Petrohawk Centrum Limited.

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

Code

1 + 1

[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).