Introduction

Length of hospital Stay (LOS) or Average Length of Stay (ALOS) is an important indicator of the use of medical services that is used to assess the efficiency of hospital management, patient quality of care, and functional evaluation. Stratified sampling has been ultilised to collect the ALOS from different Peer Group. Then we implemented hypothesis testing for gathering statistical evidence from samples and we would focus on two sample t-test

The purpose of this study is to investigation if there is any statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other. This has been proven to be true through the Two-Sample T-Test as we explore through the studies below.

Patterns of data has been presented and analysed using summary statistics, box-plots and Q-Q plots inorder to provide a better understanding of the large dataset. We have utilised the software R-studio to analyse a dataset of 30021 observations and 2 variables. So, for the sake of the study, we will focus on Large and Medium hospitals only.

Problem Statement

This measure is the average length of stay in hospital. The average is calculated as the number of bed days for overnight stays divided by the number of overnight stays and is reported for selected conditions and procedures.

\(ALOS = \frac{Total Number of Patients Bed} {Number of Overnight Bed}\)

With so many variables in the Dataset, we can clean the data further by extracting only variable that is needed for investigating.

Data

In this section, we clean the dataset in order to do statistical analysis.
Creating a new column (i.e. ALOStay) that take the numeric values of Average Length of Stay (days) and eliminating the missing values.
Extract on relevant variables from the dataset (i.e. Peer Group and ALOStay) since our study of interest is only on the “Large Hospital” and “Medium Hospital”, we can subset them and study separately to make statistical analysis.
Detecting and identify the missing values and outliers in the dataset and eliminating them as shown in the next section. we can see that the overall statistics would be affected if the outlier was not eliminated and no accurate statistical analysis can be drawn if the missing values were not eliminated as well.

Data (cont.)

Import and Variable Check

ALOS <- read_excel("average-length-of-stay-multilevel-data.xlsx", 
    col_types = c("text", "text", "text", 
        "text", "text", "text", "text", "text", 
        "skip", "text", "skip", "text", "skip", 
        "text", "skip", "text", "skip", "text", 
        "skip"), skip = 12)
class(ALOS$`Peer group`)

[1] "character"

class(ALOS$`Average length of stay (days)`)

[1] "character"

Data (cont.)

Importing only revelant data from the dataset.

ALOS$`Peer group` <- as.factor(ALOS$`Peer group`)
ALOS$ALOStay <- ALOS$`Average length of stay (days)` %>% as.numeric(ALOS$`Average length of stay (days)`)
ALOS$`Peer group` <- ALOS$`Peer group` %>% factor(levels = c("Children's hospitals", "Large hospitals", "Major hospitals", "Medium hospitals", "Small hospitals", "Unpeered"), labels = c("Children", "Large", "Major", "Medium", "Small", "Unpeered"))
ALOS2 <- ALOS[,c("Peer group","ALOStay")]
head(ALOS2)

# A tibble: 6 x 2
  `Peer group` ALOStay
  <fct>          <dbl>
1 Large            3.9
2 Large            3.3
3 Large            3.1
4 Large            2.5
5 Large            2.6
6 Large            2.7

Descriptive Statistics and Visualisation

We filter out the “Large” and “Medium” Hospital and obtain a summary statistics of the ALOS.

ALOS2 <- ALOS2 %>% group_by(`Peer group`) %>% filter(`Peer group` == "Large" | `Peer group` == "Medium")

Summary Statistics

ALOS2 %>% summarise(Min = min(ALOStay,na.rm = TRUE) %>% round(3),
            Q1 = quantile(ALOStay,probs = .25,na.rm = TRUE) %>% round(3),
            Median = median(ALOStay, na.rm = TRUE) %>% round(3),
            Q3 = quantile(ALOStay,probs = .75,na.rm = TRUE) %>% round(3),
            Max = max(ALOStay,na.rm = TRUE) %>% round(3),
            Mean = mean(ALOStay, na.rm = TRUE) %>% round(3),
            SD = sd(ALOStay, na.rm = TRUE) %>% round(3),
            n = n(),
            Missing = sum(is.na(ALOStay)))

# A tibble: 2 x 10
  `Peer group`   Min    Q1 Median    Q3   Max  Mean    SD     n Missing
  <fct>        <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <int>   <int>
1 Large          1.2   2.5    3.5   5    12.6  3.99  1.98  5692    1281
2 Medium         1     2.4    3.4   4.5  13.2  3.71  1.85  3877    1695

Descriptive Statistics and Visualisation cont.

Summary Statistics of Interval Estimates

Using the t-distribution and t-critical values to calculate interval estimates for sample means (assuming Normality or the CLT applies).
The table generated below are of 95% CIs for means

ALOS2 %>% group_by(`Peer group`) %>% summarise(Mean = round(mean(ALOStay, na.rm = TRUE),2),
                                                  SD = round(sd(ALOStay, na.rm = TRUE),3),
                                                  n = n(),
                                                  tcrit = round(qt(p = 0.975, df = n - 1),3),
                                                  SE = round(SD/sqrt(n),3),
                                                  `95% CI Lower Bound` = round(Mean - tcrit * SE,2),
                                                  `95% CI Upper Bound` = round(Mean + tcrit * SE,2))

## # A tibble: 2 x 8
##   `Peer group`  Mean    SD     n tcrit    SE `95% CI Lower Bo… `95% CI Upper Bo…
##   <fct>        <dbl> <dbl> <int> <dbl> <dbl>             <dbl>             <dbl>
## 1 Large         3.99  1.98  5692  1.96 0.026              3.94              4.04
## 2 Medium        3.71  1.85  3877  1.96 0.03               3.65              3.77

Descriptive Statistics and Visualisation cont.

Histogram of the Overall Average

hist(ALOS2$ALOStay, main = 'Histogram of Average length of Stay', xlab = 'Average Length of Stay (days)', breaks = 30)
abline(v=mean(ALOS2$ALOStay, na.rm = TRUE), lw=2, col=2)

Histogram above indicates a rightly skewed data.

Descriptive Statistics and Visualisation cont.

We can subset the data to obtain a better visualisation for each Peer Group that we are interested.

Lhospital <- subset(ALOS2, `Peer group` == 'Large', main="Large Hospital", na.rm = FALSE)
Mhospital <- subset(ALOS2, `Peer group` == 'Medium', main="Medium Hospital", na.rm = FALSE)

Descriptive Statistics and Visualisation cont.

par(mfrow=c(1,2))
hist(Lhospital$ALOStay, main = 'ALOS of Large Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Lhospital$ALOStay, na.rm = TRUE), col=2, lw=2)
hist(Mhospital$ALOStay, main = 'ALOS of Medium Hospital', xlab = 'Average Length of Stay (days)')
abline(v=mean(Mhospital$ALOStay, na.rm = TRUE), col=2, lw=2)

par(mfrow=c(1,1))

Histogram of relevant peer group also indicates also indicates a rightly skewed data

Descriptive Statistics and Visualisation cont.

Boxplot

boxplot(Lhospital$ALOStay, Mhospital$ALOStay, col=c(24, 20), main="Average Length of Stay(ALOS) vs Peer Group", xlab="Large                                                        Medium", ylab="Average Lengtth of Stay(ALOS)")

Descriptive Statistics and Visualisation cont.

Normality Check

ALOS2$ALOStay %>% qqPlot(dist = 'norm')

## [1] 2736 7918

For both groups, the Average Length of Stay (ALOS) are right skewed. Clear this can be seen in the Q-Q Plot.
But as the sample size in both groups are greater than 30, sampling distribution will approximate a normal distribution.

Descriptive Statistics and Visualisation cont.

par(mfrow=c(1,2))
Lhospital$ALOStay %>% qqPlot(dist = 'norm')

## [1] 4792  303

Mhospital$ALOStay %>% qqPlot(dist = 'norm')

## [1] 1190 1191

par(mfrow=c(1,1))

Hypothesis Testing

Using R to do the appropriate testing in order to find the statistical significant difference in the average length of stay (ALOS) between large and medium hospitals which might make patients choose one over the other.

Assumptions

Comparing two independent sample means with unknown population variance.
Large sample used (n>30 for both groups) so normality can be assumed.
Population homogeneity of variance.

Hypothesis Testing cont

Variables

\(\mu_l\) = Mean of ALOS in Large Hospital. \(\mu_m\) = Mean of ALOS in Medium Hospital
\(\sigma_l^2\) = Variance of ALOS in Large Hospital. \(\sigma_m^2\) = Variance of ALOS in Medium Hospital

Levene Test

\(H_0\): \(\sigma_l^2 = \sigma_m^2\) (The data has equal variance)
\(H_A\): \(\sigma_l^2 \neq \sigma_m^2\) (The data does not have equal variance)

leveneTest(ALOStay ~ `Peer group`, data = ALOS2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    1  16.585 4.707e-05 ***
##       6591                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value < 0.05. The p-value is small enough to reject the null hypothesis. Therefore, the data does not have equal variance. Now we can apply the two-sample t-test.

Hypothesis Testing cont

Two-Sample T-Test

We select a two-sample t-test because we want to compare the Average Length of Stay of two independent groups. But before conducting the two-sample t-test we need to check the normality and variance homogeneity assumption.

\(H_0\): \(\mu_l = \mu_m\)
\(H_A\): \(\mu_l \neq \mu_m\)

result<- t.test(ALOStay ~ `Peer group`, data = ALOS2, var.equal = FALSE, alternative = "two.sided")
result

## 
##  Welch Two Sample t-test
## 
## data:  ALOStay by Peer group
## t = 5.6615, df = 4611, p-value = 1.592e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1835797 0.3780687
## sample estimates:
##  mean in group Large mean in group Medium 
##             3.986874             3.706049

Our decision should be to reject \(H_0\): \(\mu_l = \mu_m\) as the p < 0.05 and the 95% CI of the estimated population difference [0.181, 0.380], which did not capture \(H_0\): \(\mu_l - \mu_m = 0\). The results of the two-sample t-test were therefore statistically significant. This meant that the mean of the Average Length of Stay in Large Hospital (ALOS) was significantly different from the Average Length of Stay (ALOS) in Medium Hospital.

Discussions

Testing Limitations:

The Average Length of Stay (ALOS) is mildly right skewed.

Testing Strength:

Removal of outliers and missing values

Dataset Limitations:

Large amount of missing values due to some reported data that did not meet the criteria for calculation for the Average Length of Stay (ALOS).
These were also missing values as a result of no patients reported at the time period where calculation were taken place.
Large amount of outlier (i.e 255 outliers) combined for both Peer Group of interest.

Dataset Strength:

Large amount for both Peer Group (i.e. \(n \geq 30\))
Central Limit Theorem (CLT) can be invoked.

MATH1324 Assignment 2

Statistical Differences in Average Length of Stay (ALOS) between Large and Medium Hospitals

Introduction

Problem Statement

Data

Data (cont.)

Import and Variable Check

Data (cont.)

Descriptive Statistics and Visualisation

Summary Statistics

Descriptive Statistics and Visualisation cont.

Summary Statistics of Interval Estimates

Descriptive Statistics and Visualisation cont.

Histogram of the Overall Average

Descriptive Statistics and Visualisation cont.

Descriptive Statistics and Visualisation cont.

Descriptive Statistics and Visualisation cont.

Boxplot

Descriptive Statistics and Visualisation cont.

Normality Check

Descriptive Statistics and Visualisation cont.

Hypothesis Testing

Assumptions

Hypothesis Testing cont

Variables

Levene Test

Hypothesis Testing cont

Two-Sample T-Test

Discussions

Testing Limitations:

Testing Strength:

Dataset Limitations:

Dataset Strength:

Reference