Are patients discharged soon if they choose to stay in medium over large hospitals?

Nischay Bikram Thapa

S3819491

Introduction

Problem Statement

Are patients discharged soon if they choose to stay in a medium over large hospitals?

Data

Descriptive Statistics

Looking at the summary statistics, the average length of stay in a large hospital is 3.99 days with a standard deviation of 1.98 whereas in medium hospital, the average length of stay is 3.7 days with a standard deviation of 1.85.

hosp_summary <- data %>% filter(`Peer group`=="Large hospitals"|`Peer group`=="Medium hospitals")%>% 
          group_by(`Peer group`) %>% 
          summarise(
                  Mean = mean(`Average length of stay (days)`,na.rm=T),
                  S.D = sd(`Average length of stay (days)`,na.rm=T),
                  First_quartile = quantile(`Average length of stay (days)`,0.25,na.rm=T),
                  Third_quartile = quantile(`Average length of stay (days)`,0.75,na.rm=T),
                  Min = min(`Average length of stay (days)`,na.rm=T),
                  Max = max(`Average length of stay (days)`,na.rm=T),
                  Missing = sum(is.na(`Average length of stay (days)`)))

knitr::kable(hosp_summary,caption="Summary Statistics")
Summary Statistics
Peer group Mean S.D First_quartile Third_quartile Min Max Missing
Large hospitals 3.983052 1.978690 2.5 4.9 1.2 12.5 0
Medium hospitals 3.717752 1.856438 2.4 4.5 1.0 13.2 0

Data Visualisation I

The histogram shows that several patients were discharged within 0 to 5 days. However, there are longer durations of stay recorded which indicates the distribution is skewed to the right.

ggplot(data = data, aes(`Average length of stay (days)`)) +
  geom_histogram(bins=22) + 
  ylab('Frequency') + 
  ggtitle('Histogram of Average length of Stay (in days)')

Data Visualisation II

medium_large <- data  %>% filter(data$`Peer group`=='Large hospitals'|data$`Peer group`=='Medium hospitals')

ggplot(medium_large,aes(`Peer group`,`Average length of stay (days)`))+ 
  geom_boxplot(aes(fill= `Peer group`)) +
  ggtitle('Average Length of Stay between Large and Medium Hospitals')

Hypothesis Testing

Independent Sample t-test

Hypothesis Generation

\(H_0\): There is no difference in the mean of average length of stay between medium and large hospitals

\(H_a\): There is significant difference between the average length of stay between medium and large hospitals

Mathematically,

\(H_0: \mu_1 - \mu_2 = 0\)

\(H_a: \mu_1 - \mu_2 \, \star \, 0\)

Testing the Assumption for Normality

Viewing the normality plot, it is evident that the average length of stay for both large and medium hospitals are skewed to the right. However, due to the large sample size, normality is ignored.

Homogenity of Variance

Assuming equal variance, the Levene Test is performed to examine the homogeneity of Variance. The results with \(p\)-value < 0.01 provides evidence that Levene Test is statistically significant. This implies the variance between two groups; large and medium hospitals are not equal.

leveneTest(`Average length of stay (days)`~ `Peer group`,data=data)
## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group     5  86.543 < 2.2e-16 ***
##       10032                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Independent Sample t-test (Assuming Unequal Variance)

t.test(
  `Average length of stay (days)`~ `Peer group`,
  data = medium_large,
  var.equal = FALSE,
  alternative = "two.sided"
  )
## 
##  Welch Two Sample t-test
## 
## data:  Average length of stay (days) by Peer group
## t = 5.2855, df = 4500.1, p-value = 1.313e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1668947 0.3637059
## sample estimates:
##  mean in group Large hospitals mean in group Medium hospitals 
##                       3.983052                       3.717752

Discussion

Major Findings

References