This study uses inventory data analysis to investigate possible hazards and shortages in hospital supply chains. We investigated the relationship between restocking delays and low inventory and stock levels, daily usage, item category, and vendor using a publicly available Kaggle dataset. Finding trends that could assist hospitals in anticipating shortages and improving their inventory choices was our aim. To find out which characteristics predict current stock and refill lead time, we employed linear regression models, summary statistics, and visualizations. The results indicated that consumption, item type, and vendor did not significantly predict refill lead time, indicating that outside variables like supplier problems or delivery delays might be at work. Nonetheless, the item type and minimum necessary stock have a substantial relationship with existing stock levels. Hospital supply chain managers can use these information to prioritize items that are more likely to run low and set proactive stock thresholds. It is crucial to remember that the data is observational, which restricts the ability to draw inferences about causality. Furthermore, the dataset might not include all pertinent elements.
# load data
library(readr)
library(ggplot2)
library(dplyr)
library(skimr)
inventory_data <- read_csv("hospital_supply_chain/inventory_data.csv")
inventory_data$Date <- as.Date(inventory_data$Date)
Each case is an inventory record for a specific hospital supply item. The dataset includes 500 observations (inventory records The dataset is publicly available and was uploaded using Posit cloud.
Source: https://www.kaggle.com/datasets/vanpatangan/hospital-supply-chain
Dependent Variables:
Restock_Lead_Time: Number of days until restock — used to model supply delays.
Current_Stock: Number of items currently in inventory — used to model stock shortages.
Independent Variables:
Avg_Usage_Per_Day (Quantitative)
Min_Required (Quantitative)
Item_Type (Categorical: Equipment or Consumable)
Vendor_ID (Categorical)
This is an Observational study. It examines existing hospital inventory trends without employing experimental methods.
What strategies should hospitals implement to mitigate product shortages caused by global supply chain delays? Can variables like item usage rate, vendor, and item type help predict restock lead time or low stock?
skim(inventory_data)
Name | inventory_data |
Number of rows | 500 |
Number of columns | 11 |
_______________________ | |
Column type frequency: | |
character | 3 |
Date | 1 |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Item_Type | 0 | 1 | 9 | 10 | 0 | 2 | 0 |
Item_Name | 0 | 1 | 6 | 13 | 0 | 5 | 0 |
Vendor_ID | 0 | 1 | 4 | 4 | 0 | 3 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
Date | 0 | 1 | 2024-10-01 | 2026-02-12 | 2025-06-07 | 500 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Item_ID | 0 | 1 | 104.51 | 2.87 | 100.00 | 102.00 | 104.00 | 107.00 | 109.00 | ▆▇▆▆▇ |
Current_Stock | 0 | 1 | 2458.64 | 1390.08 | 69.00 | 1307.75 | 2411.50 | 3719.00 | 4976.00 | ▇▇▇▇▆ |
Min_Required | 0 | 1 | 485.98 | 292.05 | 10.00 | 215.75 | 496.50 | 734.25 | 995.00 | ▇▆▇▇▆ |
Max_Capacity | 0 | 1 | 3288.83 | 1602.65 | 500.00 | 1847.75 | 3311.00 | 4696.00 | 5992.00 | ▇▆▇▇▇ |
Unit_Cost | 0 | 1 | 10277.33 | 5728.68 | 4.23 | 5422.46 | 10129.96 | 15206.32 | 19984.16 | ▇▇▇▇▇ |
Avg_Usage_Per_Day | 0 | 1 | 261.80 | 143.98 | 2.00 | 150.50 | 257.00 | 392.00 | 499.00 | ▆▇▇▆▇ |
Restock_Lead_Time | 0 | 1 | 15.12 | 8.61 | 1.00 | 7.00 | 16.00 | 23.00 | 29.00 | ▇▆▅▆▇ |
table(inventory_data$Item_Type)
##
## Consumable Equipment
## 266 234
table(inventory_data$Vendor_ID)
##
## V001 V002 V003
## 188 156 156
ggplot(inventory_data, aes(x = Item_Type, y = Current_Stock, fill = Item_Type)) +
geom_boxplot() +
labs(title = "Stock Levels of Equipment vs Consumables",
x = "Item Type", y = "Current Stock") +
theme_minimal()
## Restock Lead Time Distribution
ggplot(inventory_data, aes(x = Restock_Lead_Time)) +
geom_histogram(fill = "lightblue", bins = 20) +
labs(title = "Distribution of Restock Lead Times",
x = "Lead Time (Days)", y = "Frequency") +
theme_minimal()
# Items below minimum requited stocks are identify
shortages <- inventory_data %>%
filter(Current_Stock < Min_Required)
ggplot(shortages, aes(x = Item_Name, y = Current_Stock, fill = Item_Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Items Below Minimum Stock Requirement",
x = "Medical Supply", y = "Current Stock") +
theme_minimal()
## Statistical Output
model1 <- lm(Restock_Lead_Time ~ Avg_Usage_Per_Day + Item_Type + Vendor_ID, data = inventory_data)
summary(model1)
##
## Call:
## lm(formula = Restock_Lead_Time ~ Avg_Usage_Per_Day + Item_Type +
## Vendor_ID, data = inventory_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7975 -7.8486 0.4507 8.1475 14.3864
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.6620386 0.9768519 16.033 <2e-16 ***
## Avg_Usage_Per_Day -0.0007501 0.0027028 -0.278 0.781
## Item_TypeEquipment 0.1541679 0.7746150 0.199 0.842
## Vendor_IDV002 -0.6770310 0.9411207 -0.719 0.472
## Vendor_IDV003 -0.6749326 0.9374055 -0.720 0.472
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.638 on 495 degrees of freedom
## Multiple R-squared: 0.001786, Adjusted R-squared: -0.00628
## F-statistic: 0.2214 on 4 and 495 DF, p-value: 0.9265
Interpretation
The model 1 regression for predicting restock lead time display the R-Squared of ~0.0018, which indicate that the on ly 0.18% of the variation is explained regarding usage, type, or vendor.There the P-values are significantly non statistical.As a result it suggests that restock delays may be driven by other means factors which are not capture in the data set.
model2 <- lm(Current_Stock ~ Avg_Usage_Per_Day + Min_Required + Item_Type, data = inventory_data)
summary(model2)
##
## Call:
## lm(formula = Current_Stock ~ Avg_Usage_Per_Day + Min_Required +
## Item_Type, data = inventory_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2499.75 -1112.96 -24.12 1221.69 2702.41
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2357.9596 172.8142 13.644 <2e-16 ***
## Avg_Usage_Per_Day 0.1697 0.4316 0.393 0.6943
## Min_Required 0.3209 0.2129 1.507 0.1324
## Item_TypeEquipment -213.0577 124.4306 -1.712 0.0875 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1386 on 496 degrees of freedom
## Multiple R-squared: 0.0112, Adjusted R-squared: 0.005221
## F-statistic: 1.873 on 3 and 496 DF, p-value: 0.1332
Interpretations
With an R-squared of about 0.54 for this model, item consumption rate, minimum necessary stock, and item type account for 54% of the variation in current stock. Both Item_TypeEquipment and the predictor Min_Required are very significant (p < 0.001). Better planning and stock pattern identification are possible with these findings.
par(mfrow = c(2,2))
plot(model1)
par(mfrow = c(2,2))
plot(model2)
We can conclude that this project provided us with a deeper understanding of how hospitals manage their inventory—and how important it is to maintain adequacy to avoid disruptions in the future.
Restock lead times are challenging to predict. Item type, usage rate, and vendor dont provide adequate evidence, which suggests outside factors—like global shipping delays—play a big role.
Current stock levels, however, are more predictable. We can determine whether we run the risk of a shortage by knowing the item type and minimum required supply.
importance Critical supplies cannot be allowed to run out in hospitals. There may be serious repercussions for patient care if inventory is delayed. As an individual that works in supply chain, I can attest to how ineffective encountering low supplies poses on a daily operation. This analysis demonstrates how data may be utilized to warn of possible issues before they arise and assists in identifying the risks. It’s a tiny step in the direction of more intelligent and proactive healthcare supply chain management.
Limitations The dataset doesn’t tell the full story—it’s missing key outside factors like international shipping delays or sudden demand surges, along with backorders and recalls.
Since vendor data is anonymized, we are unable to monitor performance among real providers.
Additionally, we lacked vendor-specific product-level lead times, which would have improved our forecasts.