Roy Wong Kher Yung (S3835352)
10/5/2020
Length of hospital Stay (LOS) or Average Length of Stay (ALOS) is an important indicator of the use of medical services that is used to assess the efficiency of hospital management, patient quality of care, and functional evaluation. Stratified sampling has been ultilised to collect the ALOS from different Peer Group. Then we implemented hypothesis testing for gathering statistical evidence from samples and we would focus on two sample t-test
The purpose of this study is to investigation if there is any statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other. This has been proven to be true through the Two-Sample T-Test as we explore through the studies below.
Patterns of data has been presented and analysed using summary statistics, box-plots and Q-Q plots inorder to provide a better understanding of the large dataset. We have utilised the software R-studio to analyse a dataset of 30021 observations and 2 variables. So, for the sake of the study, we will focus on Large and Medium hospitals only.
This measure is the average length of stay in hospital. The average is calculated as the number of bed days for overnight stays divided by the number of overnight stays and is reported for selected conditions and procedures.
\(ALOS = \frac{Total Number of Patients Bed} {Number of Overnight Bed}\)
With so many variables in the Dataset, we can clean the data further by extracting only variable that is needed for investigating.
ALOStay) that take the numeric values of Average Length of Stay (days) and eliminating the missing values.Peer Group and ALOStay) since our study of interest is only on the “Large Hospital” and “Medium Hospital”, we can subset them and study separately to make statistical analysis.ALOS <- read_excel("average-length-of-stay-multilevel-data.xlsx",
col_types = c("text", "text", "text",
"text", "text", "text", "text", "text",
"skip", "text", "skip", "text", "skip",
"text", "skip", "text", "skip", "text",
"skip"), skip = 12)
class(ALOS$`Peer group`)[1] "character"
[1] "character"
ALOS$`Peer group` <- as.factor(ALOS$`Peer group`)
ALOS$ALOStay <- ALOS$`Average length of stay (days)` %>% as.numeric(ALOS$`Average length of stay (days)`)
ALOS$`Peer group` <- ALOS$`Peer group` %>% factor(levels = c("Children's hospitals", "Large hospitals", "Major hospitals", "Medium hospitals", "Small hospitals", "Unpeered"), labels = c("Children", "Large", "Major", "Medium", "Small", "Unpeered"))
ALOS2 <- ALOS[,c("Peer group","ALOStay")]
head(ALOS2)# A tibble: 6 x 2
`Peer group` ALOStay
<fct> <dbl>
1 Large 3.9
2 Large 3.3
3 Large 3.1
4 Large 2.5
5 Large 2.6
6 Large 2.7
ALOS2 %>% summarise(Min = min(ALOStay,na.rm = TRUE) %>% round(3),
Q1 = quantile(ALOStay,probs = .25,na.rm = TRUE) %>% round(3),
Median = median(ALOStay, na.rm = TRUE) %>% round(3),
Q3 = quantile(ALOStay,probs = .75,na.rm = TRUE) %>% round(3),
Max = max(ALOStay,na.rm = TRUE) %>% round(3),
Mean = mean(ALOStay, na.rm = TRUE) %>% round(3),
SD = sd(ALOStay, na.rm = TRUE) %>% round(3),
n = n(),
Missing = sum(is.na(ALOStay)))# A tibble: 2 x 10
`Peer group` Min Q1 Median Q3 Max Mean SD n Missing
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Large 1.2 2.5 3.5 5 12.6 3.99 1.98 5692 1281
2 Medium 1 2.4 3.4 4.5 13.2 3.71 1.85 3877 1695
ALOS2 %>% group_by(`Peer group`) %>% summarise(Mean = round(mean(ALOStay, na.rm = TRUE),2),
SD = round(sd(ALOStay, na.rm = TRUE),3),
n = n(),
tcrit = round(qt(p = 0.975, df = n - 1),3),
SE = round(SD/sqrt(n),3),
`95% CI Lower Bound` = round(Mean - tcrit * SE,2),
`95% CI Upper Bound` = round(Mean + tcrit * SE,2))## # A tibble: 2 x 8
## `Peer group` Mean SD n tcrit SE `95% CI Lower Bo… `95% CI Upper Bo…
## <fct> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Large 3.99 1.98 5692 1.96 0.026 3.94 4.04
## 2 Medium 3.71 1.85 3877 1.96 0.03 3.65 3.77
hist(ALOS2$ALOStay, main = 'Histogram of Average length of Stay', xlab = 'Average Length of Stay (days)', breaks = 30)
abline(v=mean(ALOS2$ALOStay, na.rm = TRUE), lw=2, col=2)Peer Group that we are interested.par(mfrow=c(1,2))
hist(Lhospital$ALOStay, main = 'ALOS of Large Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Lhospital$ALOStay, na.rm = TRUE), col=2, lw=2)
hist(Mhospital$ALOStay, main = 'ALOS of Medium Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Mhospital$ALOStay, na.rm = TRUE), col=2, lw=2)## [1] 2736 7918
## [1] 4792 303
## [1] 1190 1191
Using R to do the appropriate testing in order to find the statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other.
\(\mu_l\) = Mean of ALOS in Large Hospital. \(\mu_m\) = Mean of ALOS in Medium Hospital
\(\sigma_l^2\) = Variance of ALOS in Large Hospital. \(\sigma_m^2\) = Variance of ALOS in Medium Hospital
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 16.585 4.707e-05 ***
## 6591
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-value < 0.05. The p-value is small enough to reject the null hypothesis. Therefore, the data does not have equal variance. Now we can apply the two-sample t-test.
We select a two-sample t-test because we want to compare the Average Length of Stay of two independent groups. But before conducting the two-sample t-test we need to check the normality and variance homogeneity assumption.
result<- t.test(ALOStay ~ `Peer group`, data = ALOS2, var.equal = FALSE, alternative = "two.sided")
result##
## Welch Two Sample t-test
##
## data: ALOStay by Peer group
## t = 5.6615, df = 4611, p-value = 1.592e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1835797 0.3780687
## sample estimates:
## mean in group Large mean in group Medium
## 3.986874 3.706049
Our decision should be to reject \(H_0\): \(\mu_l = \mu_m\) as the p < 0.05 and the 95% CI of the estimated population difference [0.181, 0.380], which did not capture \(H_0\): \(\mu_l - \mu_m = 0\). The results of the two-sample t-test were therefore statistically significant. This meant that the mean of the Average Length of Stay in Large Hospital (ALOS) was significantly different from the Average Length of Stay (ALOS) in Medium Hospital.
Peer Group of interest.Data was obtained from the following site: