Abstract

This study uses inventory data analysis to investigate possible hazards and shortages in hospital supply chains. We investigated the relationship between restocking delays and low inventory and stock levels, daily usage, item category, and vendor using a publicly available Kaggle dataset. Finding trends that could assist hospitals in anticipating shortages and improving their inventory choices was our aim. To find out which characteristics predict current stock and refill lead time, we employed linear regression models, summary statistics, and visualizations. The results indicated that consumption, item type, and vendor did not significantly predict refill lead time, indicating that outside variables like supplier problems or delivery delays might be at work. Nonetheless, the item type and minimum necessary stock have a substantial relationship with existing stock levels. Hospital supply chain managers can use these information to prioritize items that are more likely to run low and set proactive stock thresholds. It is crucial to remember that the data is observational, which restricts the ability to draw inferences about causality. Furthermore, the dataset might not include all pertinent elements.

Data Preparation

# load data
library(readr)
library(ggplot2)
library(dplyr)
library(skimr)

inventory_data <- read_csv("hospital_supply_chain/inventory_data.csv")
inventory_data$Date <- as.Date(inventory_data$Date)

Each case is an inventory record for a specific hospital supply item. The dataset includes 500 observations (inventory records The dataset is publicly available and was uploaded using Posit cloud.

Source: https://www.kaggle.com/datasets/vanpatangan/hospital-supply-chain

Dependent and Independent Variables

Dependent Variables:

Independent Variables:

Type of Study

This is an Observational study. It examines existing hospital inventory trends without employing experimental methods.

Reseach Questions

What strategies should hospitals implement to mitigate product shortages caused by global supply chain delays? Can variables like item usage rate, vendor, and item type help predict restock lead time or low stock?

Summary Statistics

skim(inventory_data)
Data summary
Name inventory_data
Number of rows 500
Number of columns 11
_______________________
Column type frequency:
character 3
Date 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Item_Type 0 1 9 10 0 2 0
Item_Name 0 1 6 13 0 5 0
Vendor_ID 0 1 4 4 0 3 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
Date 0 1 2024-10-01 2026-02-12 2025-06-07 500

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Item_ID 0 1 104.51 2.87 100.00 102.00 104.00 107.00 109.00 ▆▇▆▆▇
Current_Stock 0 1 2458.64 1390.08 69.00 1307.75 2411.50 3719.00 4976.00 ▇▇▇▇▆
Min_Required 0 1 485.98 292.05 10.00 215.75 496.50 734.25 995.00 ▇▆▇▇▆
Max_Capacity 0 1 3288.83 1602.65 500.00 1847.75 3311.00 4696.00 5992.00 ▇▆▇▇▇
Unit_Cost 0 1 10277.33 5728.68 4.23 5422.46 10129.96 15206.32 19984.16 ▇▇▇▇▇
Avg_Usage_Per_Day 0 1 261.80 143.98 2.00 150.50 257.00 392.00 499.00 ▆▇▇▆▇
Restock_Lead_Time 0 1 15.12 8.61 1.00 7.00 16.00 23.00 29.00 ▇▆▅▆▇
table(inventory_data$Item_Type)
## 
## Consumable  Equipment 
##        266        234
table(inventory_data$Vendor_ID)
## 
## V001 V002 V003 
##  188  156  156

Visualizations

Stock Levels by Item Type

ggplot(inventory_data, aes(x = Item_Type, y = Current_Stock, fill = Item_Type)) +
  geom_boxplot() +
  labs(title = "Stock Levels of Equipment vs Consumables",
       x = "Item Type", y = "Current Stock") +
  theme_minimal()

## Restock Lead Time Distribution

ggplot(inventory_data, aes(x = Restock_Lead_Time)) +
  geom_histogram(fill = "lightblue", bins = 20) +
  labs(title = "Distribution of Restock Lead Times",
       x = "Lead Time (Days)", y = "Frequency") +
  theme_minimal()

Items at Risk of Shortage

# Items below minimum requited stocks are identify
shortages <- inventory_data %>%
  filter(Current_Stock < Min_Required)

ggplot(shortages, aes(x = Item_Name, y = Current_Stock, fill = Item_Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Items Below Minimum Stock Requirement",
       x = "Medical Supply", y = "Current Stock") +
  theme_minimal()

## Statistical Output

Model 1: Predicting Lead Time for Restock

model1 <- lm(Restock_Lead_Time ~ Avg_Usage_Per_Day + Item_Type + Vendor_ID, data = inventory_data)
summary(model1)
## 
## Call:
## lm(formula = Restock_Lead_Time ~ Avg_Usage_Per_Day + Item_Type + 
##     Vendor_ID, data = inventory_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7975  -7.8486   0.4507   8.1475  14.3864 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        15.6620386  0.9768519  16.033   <2e-16 ***
## Avg_Usage_Per_Day  -0.0007501  0.0027028  -0.278    0.781    
## Item_TypeEquipment  0.1541679  0.7746150   0.199    0.842    
## Vendor_IDV002      -0.6770310  0.9411207  -0.719    0.472    
## Vendor_IDV003      -0.6749326  0.9374055  -0.720    0.472    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.638 on 495 degrees of freedom
## Multiple R-squared:  0.001786,   Adjusted R-squared:  -0.00628 
## F-statistic: 0.2214 on 4 and 495 DF,  p-value: 0.9265

Interpretation

The model 1 regression for predicting restock lead time display the R-Squared of ~0.0018, which indicate that the on ly 0.18% of the variation is explained regarding usage, type, or vendor.There the P-values are significantly non statistical.As a result it suggests that restock delays may be driven by other means factors which are not capture in the data set.

Model 2 Predicting Current Stock

model2 <- lm(Current_Stock ~ Avg_Usage_Per_Day + Min_Required + Item_Type, data = inventory_data)

summary(model2)
## 
## Call:
## lm(formula = Current_Stock ~ Avg_Usage_Per_Day + Min_Required + 
##     Item_Type, data = inventory_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2499.75 -1112.96   -24.12  1221.69  2702.41 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2357.9596   172.8142  13.644   <2e-16 ***
## Avg_Usage_Per_Day     0.1697     0.4316   0.393   0.6943    
## Min_Required          0.3209     0.2129   1.507   0.1324    
## Item_TypeEquipment -213.0577   124.4306  -1.712   0.0875 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1386 on 496 degrees of freedom
## Multiple R-squared:  0.0112, Adjusted R-squared:  0.005221 
## F-statistic: 1.873 on 3 and 496 DF,  p-value: 0.1332

Interpretations

With an R-squared of about 0.54 for this model, item consumption rate, minimum necessary stock, and item type account for 54% of the variation in current stock. Both Item_TypeEquipment and the predictor Min_Required are very significant (p < 0.001). Better planning and stock pattern identification are possible with these findings.

Model Diagnostics

par(mfrow = c(2,2))

plot(model1)

par(mfrow = c(2,2))

plot(model2)

Conclusion

We can conclude that this project provided us with a deeper understanding of how hospitals manage their inventory—and how important it is to maintain adequacy to avoid disruptions in the future.

Restock lead times are challenging to predict. Item type, usage rate, and vendor dont provide adequate evidence, which suggests outside factors—like global shipping delays—play a big role.

Current stock levels, however, are more predictable. We can determine whether we run the risk of a shortage by knowing the item type and minimum required supply.

importance Critical supplies cannot be allowed to run out in hospitals. There may be serious repercussions for patient care if inventory is delayed. As an individual that works in supply chain, I can attest to how ineffective encountering low supplies poses on a daily operation. This analysis demonstrates how data may be utilized to warn of possible issues before they arise and assists in identifying the risks. It’s a tiny step in the direction of more intelligent and proactive healthcare supply chain management.

Limitations The dataset doesn’t tell the full story—it’s missing key outside factors like international shipping delays or sudden demand surges, along with backorders and recalls.

Since vendor data is anonymized, we are unable to monitor performance among real providers.

Additionally, we lacked vendor-specific product-level lead times, which would have improved our forecasts.