Homework 5

The dataset RadDat_IMSE.csv contains data from a major US hospital related to the time required to fulfill orders for X-Rays. The datafile may be found here: https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv

First off, we are going to create a dataset to use for our .csv file, but filter out all negative completion times since they are inaccurate. Then we’ll use dplyr(make sure to install if not done prior) to further refine the dataset.

dat <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv")
dat1 <- dat[dat$Ordered.to.Complete...Mins > 0, ]
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Question 1: Patients that are age 65 or older quality for Medicare. Generate a histogram on the time required to fulfill X-ray orders for Medicare patients, restricting the allowable range of times to be between the first and third quartile of observations, i.e. the IQR. What do you notice about the shape of this histogram?

First, we’ll make a dataset that has only >= 65 years old patient data using dplyr. Then, we’ll find the 1st quartile and 3rd quartile using that dataset and the quantile function. Then we’ll find the quartile range with filter again, and use that finally to make a histogram.

Medicare <- dplyr::filter(dat1, PatientAge >= 65)
Q1 <- quantile(Medicare$Ordered.to.Complete...Mins, .25)
Q3 <- quantile(Medicare$Ordered.to.Complete...Mins, .75)
IQRRange <- dplyr::filter(Medicare, Ordered.to.Complete...Mins >= Q1 & Ordered.to.Complete...Mins <= Q3)
hist(IQRRange$Ordered.to.Complete...Mins, main = "X-Ray Completion Times", xlab = "Completion Times")

The shape of the histogram has a positive skew, suggesting that the completion times are much more likely to shorter.

Question 2: We would like to compare the performance of Radiology Technician 62 to 65 based on their median time to complete an order. How would you interpret your findings?

Create two datasets, one for both 62 and 65, and then use that call the median of both.

RT62 <- dat1[dat1$Radiology.Technician == 62, ]
RT65 <- dat1[dat1$Radiology.Technician == 65, ]
median62 <- median(RT62$Ordered.to.Complete...Mins)
median65 <- median(RT65$Ordered.to.Complete...Mins)
median62

## [1] 80

median65

## [1] 27

Radiology Technician 62 is significantly slower than Technician 65, by a factor of almost 3.

Question 3: Generate a side by side boxplot comparing the age of those patients receiving STAT versus Routine orders for an X-Ray. What do these boxplots tell you?

Create datasets for STAT and Routine patient ages, load into boxplot.

STAT <- dat1[dat1$Priority == "STAT", ]
Routine <- dat1[dat1$Priority == "Routine", ]
boxplot(STAT$PatientAge, Routine$PatientAge, main = "Age vs. Speed of treatment for X-Ray", names = c("STAT", "Routine"), ylab = "Ages")

X-Rays are more likely to be quickly performed on younger patients, yet have a wider interquartile range, meaning there is more variance to the age. Routine patients on average are significantly older.

Question 4: Calculate the mean and standard deviation of the time required to complete an X-Ray order on the floor 3W compared to 4W. What do you conclude about differences between X-Ray completion times on these floors?

Create datasets for both floors, calculate mean and std. deviation for their respective order completion times.

Floor3 <- dat1[dat1$Loc.At.Exam.Complete == "3W", ]
Floor4 <- dat1[dat1$Loc.At.Exam.Complete == "4W", ]

mean(Floor3$Ordered.to.Complete...Mins)

## [1] 1463.051

mean(Floor4$Ordered.to.Complete...Mins)

## [1] 1675.451

sd(Floor3$Ordered.to.Complete...Mins)

## [1] 3894.639

sd(Floor4$Ordered.to.Complete...Mins)

## [1] 4387.644

Floor 4 has a higher mean time and standard deviation, meaning the time to get treated is more likely slower and sporadic. Floor 3 would be more desirable.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Homework 5

Ian Baker

2025-06-14