Homework 5

The dataset RadDat_IMSE.csv contains data from a major US hospital related to the time required to fulfill orders for X-Rays.  The datafile may be found here: https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv

Some columns of specific interest include:

PatientAge = Age of Patient

Radiology.Technician = A unique identifier assigned to each Radiology Technician

Priority = Priority of the order, STAT or Routine

Loc.At.Exam.Complete = The floors of the hospital (e.g. 3W, 4W, 5E, etc.)

Ordered.to.Complete…Mins = The time required to complete the order in minutes. 


There are some data quality issues, and for some patients the reported time to complete an order is negative.  These observations should be filtered out of the dataset before performing your analysis (see the video on filtering rows/selecting columns in the first module).

dat <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv")
dat1 <- dat[dat$Ordered.to.Complete...Mins > 0, ]

Part 1

Patients that are age 65 or older quality for Medicare.  Generate a histogram on the time required to fulfill X-ray orders for Medicare patients, restricting the allowable range of times to be between the first and third quartile of observations, i.e. the IQR.  What do you notice about the shape of this histogram?

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Medicare <- dplyr::filter(dat1, PatientAge >= 65)
Q1 <- quantile(Medicare$Ordered.to.Complete...Mins, .25)
Q3 <- quantile(Medicare$Ordered.to.Complete...Mins, .75)
IQRRange <- dplyr::filter(Medicare, Ordered.to.Complete...Mins >= Q1 & Ordered.to.Complete...Mins <= Q3)
hist(IQRRange$Ordered.to.Complete...Mins, main = "Histogram of Order Completion Times", xlab = "Completion Times")

The histogram for this portion is a downward slope which signifies that most patients 65 and over are helped quickly. This plot excludes data outliers.

Part 2

We would like to compare the performance of Radiology Technician 62 to 65 based on their median time to complete an order.  How would you interpret your findings?

RT62 <- dat1[dat1$Radiology.Technician == 62, ]
RT65 <- dat1[dat1$Radiology.Technician == 65, ]
median62 <- median(RT62$Ordered.to.Complete...Mins)
median65 <- median(RT65$Ordered.to.Complete...Mins)
median62
## [1] 80
median65
## [1] 27

Radiology Technician 62 most often takes 80 minutes to help patients with x-rays whereas Radiology Technician 65 usually takes 27 minutes to help patients.

Part 3

Generate a side by side boxplot comparing the age of those patients receiving STAT versus Routine orders for an X-Ray.  What do these boxplots tell you?

Pronto <- dat1[dat1$Priority == "STAT", ]
Routine <- dat1[dat1$Priority == "Routine", ]
boxplot(Pronto$PatientAge, Routine$PatientAge, main = "Boxplot of age to speed of x-ray priority", names = c("STAT", "Routine"), ylab = "Ages")

There is a larger range of people needing quick x-rays than for routine. As the age goes up, more routine x-rays than quick ones are needed. At a young age, more routine x-rays are also needed.

Part 4

Calculate the mean and standard deviation of the time required to complete an X-Ray order on the floor 3W compared to 4W.  What do you conclude about differences between X-Ray completion times on these floors?

F3W <- dat1[dat1$Loc.At.Exam.Complete == "3W", ]
F4W <- dat1[dat1$Loc.At.Exam.Complete == "4W", ]

mean(F3W$Ordered.to.Complete...Mins) 
## [1] 1463.051
mean(F4W$Ordered.to.Complete...Mins)
## [1] 1675.451
sd(F3W$Ordered.to.Complete...Mins)
## [1] 3894.639
sd(F4W$Ordered.to.Complete...Mins)
## [1] 4387.644

You are more likely to be treated quicker on floor 3W than 4W. Its mean is lower and its standard deviation is lower which means there won’t be as many surprises that may cause it to run longer.