Loading and Preparing the Data

The data used for this R markdown document is blood product wastage data from a hospital. The data lists every blood product that has gone to waste over a stretch of 4 years. Decision staff would like to know if their efforts to reduce wastage throughout 2023 have made a difference.

The first thing we need to do is load a few packages that we will need and then load up the datafile which is in xlsx format. An image of the excel spreadsheet is included for reference.

library(readxl)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
Wastage_Working <- read_excel("Wastage_Working.xlsx")
waste = Wastage_Working
knitr::include_graphics("Wastage_Working_Image.png")
An image of the excel spreadsheet used for this analysis.
An image of the excel spreadsheet used for this analysis.

Creating new variables

Decision staff has asked for the data to be presented monthly. So we will create a new dataframe from the existing data that will contain a monthly sum of the blood product wastage. Additionally, we will add a year column to this dataframe that extracts the year value from each data entry as a separate column. This will help us with the time series analysis later on.

monthly_total = aggregate(waste['AdjustedCost'], list(month=substr(waste$DiscardDate, 1, 7)), FUN = sum, na.rm = TRUE)
monthly_total$month= ym(monthly_total$month)

monthly_total$year = as.POSIXct(monthly_total$month, format = "%Y/%m/%d")
monthly_total$year = format(monthly_total$year, format = "%Y")

Time Series Analysis

When preparing data for time series analysis we need to create three new variables that are part of the time series equation. 1. A variable indicating the time 2. A variable indicating if the observation occured before or after the event 3. A variable indicating the time passed since the event (In our case, since the start of 2023)

monthly_total$time = rep(1:nrow(monthly_total)) 
monthly_total$treatment = ifelse(monthly_total$year < 2023, 0,1) 
monthly_total$timesince = c(rep(0,38),rep(1:10))

Calculating the coefficents

Now we can feed the data into a linear regression formula. The results can be visualized in a table using the stargazer package.

reg_waste = lm(AdjustedCost ~ time + treatment + timesince, data = monthly_total)

stargazer( reg_waste, 
           type = "text", 
           dep.var.labels = ("Adjusted Monthly Wastage Costs"),
           column.labels = "",
           covariate.labels = c("Time", "Treatment", "Time Since Treatment"),
           omit.stat = "all", 
           digits = 3,
           out = "waste_lm.html")
## 
## ===================================================
##                           Dependent variable:      
##                      ------------------------------
##                      Adjusted Monthly Wastage Costs
##                                                    
## ---------------------------------------------------
## Time                             44.095            
##                                (363.425)           
##                                                    
## Treatment                     -12,093.860          
##                               (18,513.040)         
##                                                    
## Time Since Treatment            753.891            
##                               (2,729.025)          
##                                                    
## Constant                      15,014.920*          
##                               (8,130.488)          
##                                                    
## ===================================================
## ===================================================
## Note:                   *p<0.1; **p<0.05; ***p<0.01

It appears as though the treatment effect (measures implemented in 2023 to reduce wastage) were immediately effective, observed by the -12,000 in the treatment line (albeit this was not statistically significant).

Plotting the results

Is is easier to visualize the results using a plot. The next code chunk will make a prediction of the monthly waste based on our regression from the previous code chunk and the monthly total data. Then we create a plot displaying the data, with the red line displaying the start of 2023 (our “treatment” variable).

pred_waste <- predict(reg_waste, monthly_total)

plot(monthly_total$AdjustedCost,
      col = "black",
      xlim = c( 1, nrow (monthly_total) ),
      xlab = "Time (months)", 
      ylab = "Adjusted Monthly Cost",
     main = "Monthly Wastage Costs 2019-2023")

lines(pred_waste, col="blue", lwd = 3 )

# Line marking the interruption
abline( v=38, col="red", lty=2 )