MATH1324 Assignment 2

Statistical Differences in Average Length of Stay (ALOS) between Large and Medium Hospitals

Roy Wong Kher Yung (S3835352)

10/5/2020

Introduction

Length of hospital Stay (LOS) or Average Length of Stay (ALOS) is an important indicator of the use of medical services that is used to assess the efficiency of hospital management, patient quality of care, and functional evaluation. Stratified sampling has been ultilised to collect the ALOS from different Peer Group. Then we implemented hypothesis testing for gathering statistical evidence from samples and we would focus on two sample t-test

The purpose of this study is to investigation if there is any statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other. This has been proven to be true through the Two-Sample T-Test as we explore through the studies below.

Patterns of data has been presented and analysed using summary statistics, box-plots and Q-Q plots inorder to provide a better understanding of the large dataset. We have utilised the software R-studio to analyse a dataset of 30021 observations and 2 variables. So, for the sake of the study, we will focus on Large and Medium hospitals only.

Problem Statement

This measure is the average length of stay in hospital. The average is calculated as the number of bed days for overnight stays divided by the number of overnight stays and is reported for selected conditions and procedures.

\(ALOS = \frac{Total Number of Patients Bed} {Number of Overnight Bed}\)

With so many variables in the Dataset, we can clean the data further by extracting only variable that is needed for investigating.

Data

Data (cont.)

Import and Variable Check

ALOS <- read_excel("average-length-of-stay-multilevel-data.xlsx", 
    col_types = c("text", "text", "text", 
        "text", "text", "text", "text", "text", 
        "skip", "text", "skip", "text", "skip", 
        "text", "skip", "text", "skip", "text", 
        "skip"), skip = 12)
class(ALOS$`Peer group`)
[1] "character"
class(ALOS$`Average length of stay (days)`)
[1] "character"

Data (cont.)

ALOS$`Peer group` <- as.factor(ALOS$`Peer group`)
ALOS$ALOStay <- ALOS$`Average length of stay (days)` %>% as.numeric(ALOS$`Average length of stay (days)`)
ALOS$`Peer group` <- ALOS$`Peer group` %>% factor(levels = c("Children's hospitals", "Large hospitals", "Major hospitals", "Medium hospitals", "Small hospitals", "Unpeered"), labels = c("Children", "Large", "Major", "Medium", "Small", "Unpeered"))
ALOS2 <- ALOS[,c("Peer group","ALOStay")]
head(ALOS2)
# A tibble: 6 x 2
  `Peer group` ALOStay
  <fct>          <dbl>
1 Large            3.9
2 Large            3.3
3 Large            3.1
4 Large            2.5
5 Large            2.6
6 Large            2.7

Descriptive Statistics and Visualisation

ALOS2 <- ALOS2 %>% group_by(`Peer group`) %>% filter(`Peer group` == "Large" | `Peer group` == "Medium")

Summary Statistics

ALOS2 %>% summarise(Min = min(ALOStay,na.rm = TRUE) %>% round(3),
            Q1 = quantile(ALOStay,probs = .25,na.rm = TRUE) %>% round(3),
            Median = median(ALOStay, na.rm = TRUE) %>% round(3),
            Q3 = quantile(ALOStay,probs = .75,na.rm = TRUE) %>% round(3),
            Max = max(ALOStay,na.rm = TRUE) %>% round(3),
            Mean = mean(ALOStay, na.rm = TRUE) %>% round(3),
            SD = sd(ALOStay, na.rm = TRUE) %>% round(3),
            n = n(),
            Missing = sum(is.na(ALOStay)))
# A tibble: 2 x 10
  `Peer group`   Min    Q1 Median    Q3   Max  Mean    SD     n Missing
  <fct>        <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <int>   <int>
1 Large          1.2   2.5    3.5   5    12.6  3.99  1.98  5692    1281
2 Medium         1     2.4    3.4   4.5  13.2  3.71  1.85  3877    1695

Descriptive Statistics and Visualisation cont.

Summary Statistics of Interval Estimates

ALOS2 %>% group_by(`Peer group`) %>% summarise(Mean = round(mean(ALOStay, na.rm = TRUE),2),
                                                  SD = round(sd(ALOStay, na.rm = TRUE),3),
                                                  n = n(),
                                                  tcrit = round(qt(p = 0.975, df = n - 1),3),
                                                  SE = round(SD/sqrt(n),3),
                                                  `95% CI Lower Bound` = round(Mean - tcrit * SE,2),
                                                  `95% CI Upper Bound` = round(Mean + tcrit * SE,2))
## # A tibble: 2 x 8
##   `Peer group`  Mean    SD     n tcrit    SE `95% CI Lower Bo… `95% CI Upper Bo…
##   <fct>        <dbl> <dbl> <int> <dbl> <dbl>             <dbl>             <dbl>
## 1 Large         3.99  1.98  5692  1.96 0.026              3.94              4.04
## 2 Medium        3.71  1.85  3877  1.96 0.03               3.65              3.77

Descriptive Statistics and Visualisation cont.

Histogram of the Overall Average

hist(ALOS2$ALOStay, main = 'Histogram of Average length of Stay', xlab = 'Average Length of Stay (days)', breaks = 30)
abline(v=mean(ALOS2$ALOStay, na.rm = TRUE), lw=2, col=2)

Descriptive Statistics and Visualisation cont.

Lhospital <- subset(ALOS2, `Peer group` == 'Large', main="Large Hospital", na.rm = FALSE)
Mhospital <- subset(ALOS2, `Peer group` == 'Medium', main="Medium Hospital", na.rm = FALSE)

Descriptive Statistics and Visualisation cont.

par(mfrow=c(1,2))
hist(Lhospital$ALOStay, main = 'ALOS of Large Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Lhospital$ALOStay, na.rm = TRUE), col=2, lw=2)
hist(Mhospital$ALOStay, main = 'ALOS of Medium Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Mhospital$ALOStay, na.rm = TRUE), col=2, lw=2)

par(mfrow=c(1,1))

Descriptive Statistics and Visualisation cont.

Boxplot

boxplot(Lhospital$ALOStay, Mhospital$ALOStay, col=c(24, 20), main="Average Length of Stay(ALOS) vs Peer Group", xlab="Large                                                        Medium", ylab="Average Lengtth of Stay(ALOS)")

Descriptive Statistics and Visualisation cont.

Normality Check

ALOS2$ALOStay %>% qqPlot(dist = 'norm')

## [1] 2736 7918

Descriptive Statistics and Visualisation cont.

par(mfrow=c(1,2))
Lhospital$ALOStay %>% qqPlot(dist = 'norm')
## [1] 4792  303
Mhospital$ALOStay %>% qqPlot(dist = 'norm')

## [1] 1190 1191
par(mfrow=c(1,1))

Hypothesis Testing

Using R to do the appropriate testing in order to find the statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other.

Assumptions

Hypothesis Testing cont

Variables

Levene Test

leveneTest(ALOStay ~ `Peer group`, data = ALOS2)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    1  16.585 4.707e-05 ***
##       6591                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value < 0.05. The p-value is small enough to reject the null hypothesis. Therefore, the data does not have equal variance. Now we can apply the two-sample t-test.

Hypothesis Testing cont

Two-Sample T-Test

We select a two-sample t-test because we want to compare the Average Length of Stay of two independent groups. But before conducting the two-sample t-test we need to check the normality and variance homogeneity assumption.

result<- t.test(ALOStay ~ `Peer group`, data = ALOS2, var.equal = FALSE, alternative = "two.sided")
result
## 
##  Welch Two Sample t-test
## 
## data:  ALOStay by Peer group
## t = 5.6615, df = 4611, p-value = 1.592e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1835797 0.3780687
## sample estimates:
##  mean in group Large mean in group Medium 
##             3.986874             3.706049

Our decision should be to reject \(H_0\): \(\mu_l = \mu_m\) as the p < 0.05 and the 95% CI of the estimated population difference [0.181, 0.380], which did not capture \(H_0\): \(\mu_l - \mu_m = 0\). The results of the two-sample t-test were therefore statistically significant. This meant that the mean of the Average Length of Stay in Large Hospital (ALOS) was significantly different from the Average Length of Stay (ALOS) in Medium Hospital.

Discussions

Testing Limitations:

Testing Strength:

Dataset Limitations:

Dataset Strength:

Reference

Data was obtained from the following site: