In a study at the Ohio State University Medical Center, doctors investigated the association between ICU acquired paresis (ICUAP) (or muscular weakness) and subject’s admission SOFA score dichotomized at 11 (sofa11) (0 for less than 11 and 1 for 11 and greater).
Specifically the investigators wanted to determine if handgrip strength (max_grip) could be used as a surrogate for ICUAP. The investigators believed that the number of days the patient was mechanically ventilated (MV_days) could affect the relationship between weakness and dichotomized SOFA.
Use the ICU acquired weakness dataset to answer the following exercises. You can also find this data within this CSV file.
url <- "https://d396qusza40orc.cloudfront.net/appliedregression/Homeworks/Acquired_weakness.csv"
download.file(url, "Acquired_weakness.csv")
acw <- read.csv("Acquired_weakness.csv")
dim(acw)
## [1] 137 4
names(acw)
## [1] "id" "max_grip" "MVdays" "sofa11"
str(acw)
## 'data.frame': 137 obs. of 4 variables:
## $ id : int 3466 5477 3366 5325 5145 3004 2984 4681 2925 4145 ...
## $ max_grip: num 12 40 26 22 30 8 14 12 12 7 ...
## $ MVdays : int 10 7 5 8 8 5 6 5 12 4 ...
## $ sofa11 : int 0 0 0 0 0 0 0 0 0 0 ...
Using a two-sample t-test, determine whether the two SOFA Score groups differ significantly with respect to ICUAP (as measured by max_grip).
below11 <- acw[acw$sofa11 == 0, ]
above11 <- acw[acw$sofa11 == 1, ]
(nb11 <- nrow(below11))
## [1] 125
(na11 <- nrow(above11))
## [1] 12
# pooled standard deviation
var.below11 <- var(below11$max_grip)
var.above11 <- var(above11$max_grip)
(var.pooled <- (var.below11 * (nb11 - 1) + var.above11 * (na11 - 1)) / ((nb11 - 1) + (na11 - 1)))
## [1] 181.3294
(SE <- sqrt(var.pooled / nb11 + var.pooled / na11))
## [1] 4.069572
(Tstatistic <- (mean(below11$max_grip)-mean(above11$max_grip)) / SE)
## [1] 2.204982
(pValue <- 2 * pt(Tstatistic, df = na11 + nb11, lower.tail = F))
## [1] 0.02912334
Instead of calculating by hand the above calculation to get to the p-value of the difference of two group means, there is a single function in R, t.test that performs that:
(tt <- t.test(max_grip ~ sofa11, data = acw, var.equal = TRUE)) # default two-sided test
##
## Two Sample t-test
##
## data: max_grip by sofa11
## t = 2.205, df = 135, p-value = 0.02915
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.924972 17.021695
## sample estimates:
## mean in group 0 mean in group 1
## 15.140000 6.166667
names(tt)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "alternative" "method" "data.name"
tt[1:9]
## $statistic
## t
## 2.204982
##
## $parameter
## df
## 135
##
## $p.value
## [1] 0.02914824
##
## $conf.int
## [1] 0.924972 17.021695
## attr(,"conf.level")
## [1] 0.95
##
## $estimate
## mean in group 0 mean in group 1
## 15.140000 6.166667
##
## $null.value
## difference in means
## 0
##
## $alternative
## [1] "two.sided"
##
## $method
## [1] " Two Sample t-test"
##
## $data.name
## [1] "max_grip by sofa11"
Perform the same analysis from Exercise One (i.e., 2 sample t-test) using a regression analysis.
(lm1 <- lm(max_grip ~ sofa11, data = acw))
##
## Call:
## lm(formula = max_grip ~ sofa11, data = acw)
##
## Coefficients:
## (Intercept) sofa11
## 15.140 -8.973
summary(lm1)
##
## Call:
## lm(formula = max_grip ~ sofa11, data = acw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.14 -7.14 -1.14 4.86 68.86
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.140 1.204 12.570 <2e-16 ***
## sofa11 -8.973 4.070 -2.205 0.0291 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.47 on 135 degrees of freedom
## Multiple R-squared: 0.03476, Adjusted R-squared: 0.02761
## F-statistic: 4.862 on 1 and 135 DF, p-value: 0.02915
(beta1 <- cov(acw$sofa11, acw$max_grip) / var(acw$sofa11))
## [1] -8.973333
(beta0 <- mean(acw$max_grip) - beta1 * mean(acw$sofa11))
## [1] 15.14
(var.y.given.x <- sum((acw$max_grip - lm1$fitted)^2) / (nrow(acw) - 2))
## [1] 181.3294
(var.beta1 <- var.y.given.x / (var(acw$sofa11)*(nrow(acw)-1)))
## [1] 16.56142
(SE <- sqrt(var.beta1))
## [1] 4.069572
(tStat <- beta1 / SE)
## [1] -2.204982
(pVal <- 2 * pt(tStat, df = nrow(acw) - 2))
## [1] 0.02914824
Using a single model determine the regression of max_grip on MVdays for each of the two SOFA score groups.
library(ggplot2)
ggplot(data = acw, aes(MVdays, max_grip, color = as.factor(acw$sofa11))) +
geom_point(position = "jitter")
plot(acw$MVdays, acw$max_grip, col = acw$sofa11+1) #added 1 to acw$sofa11 b/c it's an integer variable and one of the levels is 0. (could just make it a factor variable instead)
(lm3 <- lm(max_grip ~ MVdays*sofa11, data = acw))
##
## Call:
## lm(formula = max_grip ~ MVdays * sofa11, data = acw)
##
## Coefficients:
## (Intercept) MVdays sofa11 MVdays:sofa11
## 21.1842 -0.6938 -10.3081 0.4084
summary(lm3)
##
## Call:
## lm(formula = max_grip ~ MVdays * sofa11, data = acw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.715 -7.022 -2.246 4.691 65.591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.1842 1.9915 10.638 < 2e-16 ***
## MVdays -0.6938 0.1865 -3.721 0.000292 ***
## sofa11 -10.3081 6.9788 -1.477 0.142023
## MVdays:sofa11 0.4084 0.3851 1.060 0.290905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.88 on 133 degrees of freedom
## Multiple R-squared: 0.13, Adjusted R-squared: 0.1104
## F-statistic: 6.625 on 3 and 133 DF, p-value: 0.0003318
Here I have built a model assuming there is interaction between number of days a patient spent on mechanical ventilation and his or her SOFA score at the time of addmission. For people with SOFA < 11, average maximal grip can be calculated by:
\[y_{SOFA<11}=21.1842062 -0.6937794*MVdays\]
For people with SOFA > 11, average max grip can be found as:
\[y_{SOFA<11}=21.1842062 -0.6937794*MVdays -10.3080939+0.4083585*MVdays\]
However, is there a meaningful interaction between those two, or the interaction coefficient is equal to zero?
Test whether the two SOFA cohorts differ significantly, controlling for mechanically ventilated days.
To test whether the slope of the interaction term is equal to zero perform a partial F-test that compares model with an interaction term and model without an interaction term (max_grip ~ MVdays + sofa11).
summary(lm2 <- lm(max_grip ~ MVdays + sofa11, data = acw))
##
## Call:
## lm(formula = max_grip ~ MVdays + sofa11, data = acw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.360 -7.193 -1.958 3.611 66.042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.3503 1.8304 11.118 < 2e-16 ***
## MVdays -0.5981 0.1632 -3.664 0.000357 ***
## sofa11 -4.3157 4.0965 -1.053 0.294009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.89 on 134 degrees of freedom
## Multiple R-squared: 0.1227, Adjusted R-squared: 0.1096
## F-statistic: 9.367 on 2 and 134 DF, p-value: 0.0001557
(anova1 <- anova(lm1))
## Analysis of Variance Table
##
## Response: max_grip
## Df Sum Sq Mean Sq F value Pr(>F)
## sofa11 1 881.6 881.61 4.8619 0.02915 *
## Residuals 135 24479.5 181.33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(anova2 <- anova(lm2))
## Analysis of Variance Table
##
## Response: max_grip
## Df Sum Sq Mean Sq F value Pr(>F)
## MVdays 1 2926.6 2926.55 17.6249 4.872e-05 ***
## sofa11 1 184.3 184.29 1.1099 0.294
## Residuals 134 22250.2 166.05
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(anova3 <- anova(lm3))
## Analysis of Variance Table
##
## Response: max_grip
## Df Sum Sq Mean Sq F value Pr(>F)
## MVdays 1 2926.6 2926.55 17.6413 4.855e-05 ***
## sofa11 1 184.3 184.29 1.1109 0.2938
## MVdays:sofa11 1 186.5 186.52 1.1244 0.2909
## Residuals 133 22063.7 165.89
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Fstat <- ((sum(anova3[1:3, 2]) - sum(anova2[1:2,2]))) / anova3[4, 3])
## [1] 1.124351
(pVal <- pf(Fstat, df1 = 1, df2 = 133, lower.tail = F))
## [1] 0.2909054
p-value from the comparison of model3 and model2 that differ only in the interaction term says that the slope of the interaction term isn’t significantly different than zero. That means that there is no meaningful interaction between the variables. When we control for mechanically ventilated days, the two cohorts (SOFA < 11 and SOFA > 11) are the same.
We can visualize interaction using effects library
library(effects)
## Warning: package 'effects' was built under R version 3.2.4
lmm3 <- lm(max_grip ~ MVdays + sofa11 + MVdays:sofa11, data = acw)
plot(effect(term = "MVdays:sofa11", mod = lm3, xlevels = list(MVdays=c(2, 9, 16))), multiline = TRUE)
Determine and Compare
Test whether the two SOFA cohorts have the same slope. Do they have the same intercept? Are they coincident? Use the homework forum to share your findings.
1. Are the two slopes parallel?
This is the same question as whether \(\beta_3=0\). I have answered that question in the previous exercise where I performed partial F-test that concluded that the \(\beta_3\) isn’t significantly different than zero.
2. Do the two cohorts of SOFA have the same intercept?
We can determine that by looking at the summary output of the second model (2 predictors, no interaction term) or calculate by hand whether \(\beta_2=0\):
summary(lm2)
##
## Call:
## lm(formula = max_grip ~ MVdays + sofa11, data = acw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.360 -7.193 -1.958 3.611 66.042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.3503 1.8304 11.118 < 2e-16 ***
## MVdays -0.5981 0.1632 -3.664 0.000357 ***
## sofa11 -4.3157 4.0965 -1.053 0.294009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.89 on 134 degrees of freedom
## Multiple R-squared: 0.1227, Adjusted R-squared: 0.1096
## F-statistic: 9.367 on 2 and 134 DF, p-value: 0.0001557
(anova1mv <- anova(lm1mv <- lm(max_grip ~ MVdays, data = acw)))
## Analysis of Variance Table
##
## Response: max_grip
## Df Sum Sq Mean Sq F value Pr(>F)
## MVdays 1 2926.6 2926.55 17.611 4.885e-05 ***
## Residuals 135 22434.5 166.18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# calculation
(Fstat <- (sum(anova2[1:2, 2])- anova1mv[1, 2])/anova2[3,3])
## [1] 1.109859
(pVal <- pf(Fstat, df1 = 1, df2 = 134, lower.tail = F))
## [1] 0.2940085
summary(lm3)
##
## Call:
## lm(formula = max_grip ~ MVdays * sofa11, data = acw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.715 -7.022 -2.246 4.691 65.591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.1842 1.9915 10.638 < 2e-16 ***
## MVdays -0.6938 0.1865 -3.721 0.000292 ***
## sofa11 -10.3081 6.9788 -1.477 0.142023
## MVdays:sofa11 0.4084 0.3851 1.060 0.290905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.88 on 133 degrees of freedom
## Multiple R-squared: 0.13, Adjusted R-squared: 0.1104
## F-statistic: 6.625 on 3 and 133 DF, p-value: 0.0003318
# calculation
(SSreg3 <- sum(anova3[1:3, 2]))
## [1] 3297.363
(SSreg2 <- sum(anova2[1:2, 2]))
## [1] 3110.842
(Fstat <- ((SSreg3 - SSreg2) / 1) / anova3[4, 3])
## [1] 1.124351
(pVal <- pf(Fstat, df1 = 1, df2 = 133, lower.tail = F))
## [1] 0.2909054
Since the null hypothesis of this test says that \(\beta_2=\beta_3=0\), and given that the p-value of the test was 0.2909054, we can’t reject the hypothesis that \(\beta_2=\beta_3=0\). In other words, the slopes for the two cohorts of SOFA coincide.
fitted3 <- lm3$fitted
ggplot(data = acw, aes(x = MVdays, y = max_grip, color = as.factor(sofa11))) +
geom_point(position = "jitter") +
geom_line(aes(x = acw$MVdays,y = fitted3, col= as.factor(acw$sofa11)))