1 HW 5, X-ray Completion Times:

The data set RadDat_IMSE.csv contains data from a major US hospital related to the time required to fulfill orders for X-Rays. The data file may be found here: https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv

2 Questions to Answer:

Some columns of specific interest include:

PatientAge = Age of Patient

Radiology.Technician = A unique identifier assigned to each Radiology Technician

Priority = Priority of the order, STAT or Routine

Loc.At.Exam.Complete = The floors of the hospital (e.g. 3W, 4W, 5E, etc.)

Ordered.to.Complete…Mins = The time required to complete the order in minutes.

There are some data quality issues, and for some patients the reported time to complete an order is negative.

These observations should be filtered out of the dataset before performing your analysis (see the video on filtering rows/selecting columns in the first module).

Please answer the following questions:

Patients that are age 65 or older qualify for Medicare.

Generate a histogram on the time required to fulfill X-ray orders for Medicare patients, restricting the allowable range of times to be between the first and third quartile of observations, i.e. the IQR.

What do you notice about the shape of this histogram?

We would like to compare the performance of Radiology Technician 62 to 65 based on their median time to complete an order.

How would you interpret your findings?

Generate a side by side boxplot comparing the age of those patients receiving STAT versus Routine orders for an X-Ray.

What do these boxplots tell you?

Calculate the mean and standard deviation of the time required to complete an X-Ray order on the floor 3W compared to 4W.

What do you conclude about differences between X-Ray completion times on these floors?

For all plots/graphics, be sure to add a title and label all axes.

Submit an RMarkdown file (with html link) to Blackboard.

3 Data Cleaning Step:

library(dplyr)
dataxray <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/RadDat_IMSE.csv")
dataxray
# Reading in CSV file and also the dplyr package. 

any(dataxray$Ordered.to.Complete...Mins < 0) # check for negative values and other filtering out conditions
## [1] TRUE
any(dataxray$Ordered.to.Complete...Hours < 0)
## [1] TRUE
#results show true here.

filtereddata <- dataxray %>%
  filter(Ordered.to.Complete...Mins >= 0, Ordered.to.Complete...Hours >= 0)

any(filtereddata$Ordered.to.Complete...Mins < 0)
## [1] FALSE
any(filtereddata$Ordered.to.Complete...Hours < 0)
## [1] FALSE
#now results show false, no negative values seen

4 Question 1:

medicare_data <- filtereddata[filtereddata$PatientAge >= 65, ]

Q1 <- quantile(medicare_data$Ordered.to.Complete...Mins, 0.25)
Q3 <- quantile(medicare_data$Ordered.to.Complete...Mins, 0.75)
IQR_min <- Q3 - Q1
IQR_min
## 75% 
##  69
medicare_data_IQR <- medicare_data[medicare_data$Ordered.to.Complete...Mins >= Q1 & medicare_data$Ordered.to.Complete...Mins <= Q3, ]

hist(medicare_data_IQR$Ordered.to.Complete...Mins, main = "IQR of Completion Times", xlab = "Completion Time (Minutes)", col = "green")

# The shape of the histogram is skewed to the left, and I plot normality using the Q-Q plot with the residuals of the minutes column. 
# The Q-Q plot shape is a curved S, indicating the tail ends of the distribution are heavier. In this case, the tail ends of the distribution are heavy on th left.

qqnorm(medicare_data_IQR$Ordered.to.Complete...Mins)
qqline(medicare_data_IQR$Ordered.to.Complete...Mins, col = "blue")

5 Question 2:

technicians_62_65 <- filtereddata %>%
  filter(Radiology.Technician >= 62 & Radiology.Technician <= 65)

boxplot(Ordered.to.Complete...Mins ~ Radiology.Technician, technicians_62_65,main = "Median Completion Time by Radiology Technician",xlab = "Radiology Technician",ylab = "Completion Time (Minutes)",col = "red", ylim = c(0, 300))  

# The findings can be interpreted as the radiology techs near the end of the threshold range of 62-65 having a better performance in terms of completion time.

6 Question 3:

patientsSTATRoutine <- filtereddata[filtereddata$Priority %in% c("STAT", "Routine"), ]

boxplot(PatientAge ~ Priority, 
        col = c("red", "blue"),  
        data = patientsSTATRoutine, 
        main = "Patient Age by Order Type (STAT vs Routine)", 
        xlab = "Order Type", 
        ylab = "Age (years)",
        names = c("Rout", "STA")) # I shortened it so the group names show up.

7 Question 4:

floors_data <- filtereddata[filtereddata$Loc.At.Exam.Complete %in% c("3W", "4W"), ]

summary_stats <- aggregate(Ordered.to.Complete...Mins ~ Loc.At.Exam.Complete, data = floors_data, 
                           FUN = function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE)))

summary_df <- data.frame(
  Floor = summary_stats$Loc.At.Exam.Complete,
  Mean = summary_stats$Ordered.to.Complete...Mins[, "mean"],
  SD = summary_stats$Ordered.to.Complete...Mins[, "sd"]
)

print(summary_df)
##   Floor     Mean       SD
## 1    3W 1463.051 3894.639
## 2    4W 1675.451 4387.644