Exploratory Data Analysis was used to understand the structure and quality of the procurement dataset. This is important because procurement managers need to understand order size, order value, delivery timelines, and delivery status before making supplier or sourcing decisions.
Code
dim(data_clean)
[1] 100 11
Code
summary(data_clean)
x1 po_date product_item_id
Min. : 1.00 Min. :2021-01-29 Length:100
1st Qu.: 25.75 1st Qu.:2024-09-19 Class :character
Median : 50.50 Median :2024-12-20 Mode :character
Mean : 50.50 Mean :2025-03-08
3rd Qu.: 75.25 3rd Qu.:2025-10-11
Max. :100.00 Max. :2026-02-09
item_category vendor source
Medical Equipments :38 Idumota :26 International Supplier:39
ICT :15 Computer Village:11 Local Supplier :61
Appliances :11 Turkey : 8
Furniture & Fixtures:11 Alaba : 7
Textile/Linen : 8 Idumagbo 1 : 7
HVAC : 5 US : 7
(Other) :12 (Other) :34
quantity unit_price order_value planned_delivery_days
Min. : 1.0 Min. : 2500 Min. : 17980 Min. : 2.00
1st Qu.: 2.0 1st Qu.: 86045 1st Qu.: 415899 1st Qu.: 3.00
Median : 7.0 Median : 277128 Median : 3861148 Median : 5.00
Mean : 251.1 Mean : 2403641 Mean : 13814207 Mean :10.54
3rd Qu.: 15.0 3rd Qu.: 875900 3rd Qu.: 9805851 3rd Qu.:21.00
Max. :4500.0 Max. :70391117 Max. :136436170 Max. :21.00
delivery_status
Late :28
On-time:72
The visualisations below show the delivery performance story in the procurement data. The charts focus on delivery status, procurement source, item category, planned delivery days, and order value.
Code
ggplot(data_clean, aes(x = delivery_status)) +geom_bar() +labs(title ="Distribution of Procurement Delivery Status",x ="Delivery Status",y ="Number of Purchase Orders" )
Code
ggplot(data_clean, aes(x = source, fill = delivery_status)) +geom_bar(position ="dodge") +labs(title ="Delivery Status by Procurement Source",x ="Source",y ="Number of Purchase Orders",fill ="Delivery Status" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Code
ggplot(data_clean, aes(x = item_category, fill = delivery_status)) +geom_bar(position ="dodge") +labs(title ="Delivery Status by Item Category",x ="Item Category",y ="Number of Purchase Orders",fill ="Delivery Status" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Code
ggplot(data_clean, aes(x = delivery_status, y = planned_delivery_days)) +geom_boxplot() +labs(title ="Planned Delivery Days by Delivery Status",x ="Delivery Status",y ="Planned Delivery Days" )
Code
ggplot(data_clean, aes(x = delivery_status, y = order_value)) +geom_boxplot() +labs(title ="Order Value by Delivery Status",x ="Delivery Status",y ="Order Value" )
4. Hypothesis Testing
Hypothesis Test 1: Procurement Source and Delivery Status
H0: Procurement source and delivery status are independent.
H1: Procurement source and delivery status are associated.
This test checks whether delivery performance differs by procurement source.
Late On-time
International Supplier 18 21
Local Supplier 10 51
Code
chisq.test(source_delivery_table)
Pearson's Chi-squared test with Yates' continuity correction
data: source_delivery_table
X-squared = 9.0275, df = 1, p-value = 0.00266
Code
cramers_v(source_delivery_table)
Cramer's V (adj.) | 95% CI
--------------------------------
0.31 | [0.12, 1.00]
- One-sided CIs: upper bound fixed at [1.00].
Business interpretation: If the p-value is less than 0.05, it means procurement source is significantly associated with delivery status. This would suggest that some sources may require closer follow-up, stronger supplier monitoring, or longer planning lead time.
Hypothesis Test 2: Planned Delivery Days and Delivery Status
H0: The average planned delivery days are the same for on-time and late deliveries.
H1: The average planned delivery days are different for on-time and late deliveries.
Code
t.test(planned_delivery_days ~ delivery_status, data = data_clean)
Welch Two Sample t-test
data: planned_delivery_days by delivery_status
t = 4.0345, df = 50.441, p-value = 0.0001857
alternative hypothesis: true difference in means between group Late and group On-time is not equal to 0
95 percent confidence interval:
3.584557 10.689252
sample estimates:
mean in group Late mean in group On-time
15.678571 8.541667
Code
cohens_d(planned_delivery_days ~ delivery_status, data = data_clean)
Cohen's d | 95% CI
------------------------
0.89 | [0.43, 1.34]
- Estimated using pooled SD.
Business interpretation: If the p-value is less than 0.05, it means there is a statistically significant difference in planned delivery days between on-time and late orders. This would suggest that delivery planning is an important factor in procurement delivery performance. # 5. Correlation Analysis
Correlation analysis was used to examine the relationship among numeric procurement variables such as quantity, unit price, order value, and planned delivery days.
Code
numeric_data <- data_clean %>%select(quantity, unit_price, order_value, planned_delivery_days)correlation_matrix <-cor(numeric_data, use ="complete.obs")correlation_matrix
Business interpretation: A positive correlation means that two variables move in the same direction. For example, if quantity and order value are strongly positively correlated, it means larger procurement quantities are linked with higher order values. This is expected in procurement because buying more units usually increases the total purchase order value.
6. Logistic Regression Analysis
Logistic regression was used because the main outcome variable, delivery status, has two outcomes: On-time and Late. The model estimates the likelihood of late delivery based on procurement factors.
data_clean$predicted_probability <-predict(logistic_model, type ="response")ggplot(data_clean, aes(x = predicted_probability, fill = delivery_status)) +geom_histogram(position ="identity", alpha =0.6, bins =20) +labs(title ="Predicted Probability of Late Delivery",x ="Predicted Probability of Late Delivery",y ="Number of Purchase Orders",fill ="Delivery Status" )
Business interpretation: The logistic regression estimates which procurement factors increase or reduce the likelihood of late delivery. An odds ratio above 1 suggests a higher likelihood of late delivery, while an odds ratio below 1 suggests a lower likelihood of late delivery. # 7. Integrated Findings
The five analyses provide a combined view of procurement delivery performance at Petrohawk Centrum Limited. The exploratory analysis shows the structure of the procurement data and highlights the main variables affecting delivery performance. The visualisations show delivery patterns across source, item category, order value, and planned delivery days.
The hypothesis tests provide statistical evidence on whether delivery status differs by procurement source and planned delivery timeline. The correlation analysis shows how procurement cost and order-size variables relate to each other. The logistic regression brings the analysis together by estimating how selected procurement variables influence the likelihood of late delivery.
Overall, the findings support a data-driven supplier and category monitoring system. Petrohawk should pay closer attention to item categories, sources, and order characteristics that show higher risk of delay. # 8. Recommendation
Petrohawk Centrum Limited should implement a procurement delivery performance dashboard that tracks delivery status by vendor, source, item category, order value, and planned delivery days.
Suppliers and sources with repeated late deliveries should be flagged for management review. The procurement team should also introduce lead-time risk classification. High-value or delay-prone orders should receive earlier follow-up, stronger supplier confirmation, and alternative sourcing options where possible.
Management should use the findings from the analysis to support supplier evaluation, procurement planning, and delivery follow-up decisions. This will improve delivery reliability, reduce operational disruption, and strengthen client satisfaction. # 9. Limitations and Further Work
The analysis is limited by the available variables in the procurement dataset. The data does not include actual delivery date, supplier capacity, payment timing, logistics disruptions, or client urgency level. These additional variables could improve the strength of future analysis.
Future work should include more procurement periods, actual delivery duration, supplier rating, logistics method, and reason for delay. This would allow Petrohawk Centrum Limited to build stronger predictive models and improve procurement risk management. # 10. Appendix: AI Usage Statement
AI tools were used to support coding structure, explanation of R commands, and organisation of the Quarto report. The dataset, business context, analytical judgement, interpretation of results, and final recommendations were reviewed and adapted independently by the author based on procurement experience at Petrohawk Centrum Limited.
Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
Code
1+1
[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).