The data used for this R markdown document is blood product wastage data from a hospital. The data lists every blood product that has gone to waste over a stretch of 4 years. Decision staff would like to know if their efforts to reduce wastage throughout 2023 have made a difference.
The first thing we need to do is load a few packages that we will need and then load up the datafile which is in xlsx format. An image of the excel spreadsheet is included for reference.
library(readxl)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
Wastage_Working <- read_excel("Wastage_Working.xlsx")
waste = Wastage_Working
knitr::include_graphics("Wastage_Working_Image.png")
Decision staff has asked for the data to be presented monthly. So we will create a new dataframe from the existing data that will contain a monthly sum of the blood product wastage. Additionally, we will add a year column to this dataframe that extracts the year value from each data entry as a separate column. This will help us with the time series analysis later on.
monthly_total = aggregate(waste['AdjustedCost'], list(month=substr(waste$DiscardDate, 1, 7)), FUN = sum, na.rm = TRUE)
monthly_total$month= ym(monthly_total$month)
monthly_total$year = as.POSIXct(monthly_total$month, format = "%Y/%m/%d")
monthly_total$year = format(monthly_total$year, format = "%Y")
When preparing data for time series analysis we need to create three new variables that are part of the time series equation. 1. A variable indicating the time 2. A variable indicating if the observation occured before or after the event 3. A variable indicating the time passed since the event (In our case, since the start of 2023)
monthly_total$time = rep(1:nrow(monthly_total))
monthly_total$treatment = ifelse(monthly_total$year < 2023, 0,1)
monthly_total$timesince = c(rep(0,38),rep(1:10))
Now we can feed the data into a linear regression formula. The results can be visualized in a table using the stargazer package.
reg_waste = lm(AdjustedCost ~ time + treatment + timesince, data = monthly_total)
stargazer( reg_waste,
type = "text",
dep.var.labels = ("Adjusted Monthly Wastage Costs"),
column.labels = "",
covariate.labels = c("Time", "Treatment", "Time Since Treatment"),
omit.stat = "all",
digits = 3,
out = "waste_lm.html")
##
## ===================================================
## Dependent variable:
## ------------------------------
## Adjusted Monthly Wastage Costs
##
## ---------------------------------------------------
## Time 44.095
## (363.425)
##
## Treatment -12,093.860
## (18,513.040)
##
## Time Since Treatment 753.891
## (2,729.025)
##
## Constant 15,014.920*
## (8,130.488)
##
## ===================================================
## ===================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
It appears as though the treatment effect (measures implemented in 2023 to reduce wastage) were immediately effective, observed by the -12,000 in the treatment line (albeit this was not statistically significant).
Is is easier to visualize the results using a plot. The next code chunk will make a prediction of the monthly waste based on our regression from the previous code chunk and the monthly total data. Then we create a plot displaying the data, with the red line displaying the start of 2023 (our “treatment” variable).
pred_waste <- predict(reg_waste, monthly_total)
plot(monthly_total$AdjustedCost,
col = "black",
xlim = c( 1, nrow (monthly_total) ),
xlab = "Time (months)",
ylab = "Adjusted Monthly Cost",
main = "Monthly Wastage Costs 2019-2023")
lines(pred_waste, col="blue", lwd = 3 )
# Line marking the interruption
abline( v=38, col="red", lty=2 )