Due October 31st Answer all questions specified on the problem but if you see something interesting and want to do more analysis please report it as well. Don’t forget to include discussion and to make your plots look nice and presentable. Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are still having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required.
Following up with the Beat the Blues data from class (package HSAUR3) do the following
Construct boxplots to compare the factor variable drug in ananalogous way to how we constructed boxplots in for treatment variable. Discuss the results.
Answer- It is clear from the box plot that there is a steady decrease in BDI with lower variability in patients who took anti-depressants. Although it seems there is a slight decrease in BDI in patients who did not take anti-depressants the variability is higher which suggests patients are more likely to reducte their Beck Depression Inventory II levels as time goes on with anti-depressant treatment.
###load all libraries
library(HSAUR3)
library(dplyr)
library(tidyr)
library(ggplot2)
library(gee)
library(Matrix)
library(lme4)
library(multcomp)
library(Hmisc)
#load the data
data("BtheB", package = "HSAUR3")
#create the labels for the facet
labels <- c('No'='Did Not Take Anti-Depressants',
'Yes'='Took Anti-Depressants')
#pipe the data and gether the columns with bdi into two columns for labels and values. change the bdi.pre to 0.
#factor and replace the alphabetic and punctuation characters with empty space.
#using ggplot plot a boxplot with a facet on drug
BtheB %>%
gather(BDI_Type,BDI_Value, 4:8) %>%
mutate(BDI_Type = ifelse(BDI_Type == 'bdi.pre','0',BDI_Type),
BDI_Type = factor(gsub(BDI_Type,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
ggplot(aes(x=BDI_Type,y=BDI_Value)) +
geom_boxplot() +
facet_grid(.~drug,labeller = as_labeller(labels)) +
labs(title='Beat the Blues ~ Anti-Depressant Treatment',
x='Time (in months)',
y='BDI')
Repeat (a) for the length variable. Discuss the results.
Answer - It seems obbvious that Beck Depression Inventory II levels decrease much faster in patients that have shorter episodes of depression which is what the below box plot shows. There is much more variablility in BDI levels for patients who have longer episodes of depression and a much gentle decrease in BDI levels.
labels <- c('>6m'='Current Episode Greater than 6 Months',
'<6m'='Current Episode Less than 6 Months')
BtheB %>%
gather(BDI_Type,BDI_Value, 4:8) %>%
mutate(BDI_Type = ifelse(BDI_Type == 'bdi.pre','0',BDI_Type),
BDI_Type = factor(gsub(BDI_Type,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
ggplot(aes(x=BDI_Type,y=BDI_Value)) +
geom_boxplot() +
facet_grid(.~length,labeller = as_labeller(labels)) +
labs(title='Beat the Blues ~ Length of Current Episode of Depression',
x='Time (in months)',
y='BDI')
Use the lm function to fit a model to the Beat the Blues data that assumes that the repeated measurements are independent. Compare the results to those from fitting the random intercept model \(BtheB_lmer1\) from class.
Answer - Using the chi-squared test it appears that the linear model produces a lower AIC, higher BIC, and a better overall fit than the linear mixed-effect model. However, this does not take into effect the repeated measures on the different subjects and is misleading as shown by the residuals plot of the mixed-effects model vs that of the linear model.
BtheB$subject <- factor(rownames(BtheB))
btb <- BtheB %>%
gather(time,bdi, 5:8) %>%
mutate(time = as.numeric(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement='')))
BtheB_lm1 <- lm(bdi ~ bdi.pre + time + treatment + drug +
length + subject, data = btb)
BtheB_lmer1 <- lmer(bdi ~ bdi.pre + time + treatment + drug +
length + (1 | subject), data = btb,
REML = FALSE, na.action = na.omit)
anova(BtheB_lmer1,BtheB_lm1)
## Data: btb
## Models:
## BtheB_lmer1: bdi ~ bdi.pre + time + treatment + drug + length + (1 | subject)
## BtheB_lm1: bdi ~ bdi.pre + time + treatment + drug + length + subject
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## BtheB_lmer1 8 1887.5 1916.6 -935.75 1871.5
## BtheB_lm1 99 1778.2 2138.0 -790.08 1580.2 291.32 91 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cftest(BtheB_lmer1)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lmer(formula = bdi ~ bdi.pre + time + treatment + drug + length +
## (1 | subject), data = btb, REML = FALSE, na.action = na.omit)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) == 0 5.59239 2.24244 2.494 0.0126 *
## bdi.pre == 0 0.63968 0.07789 8.212 2.22e-16 ***
## time == 0 -0.70476 0.14639 -4.814 1.48e-06 ***
## treatmentBtheB == 0 -2.32908 1.67036 -1.394 0.1632
## drugYes == 0 -2.82495 1.72684 -1.636 0.1019
## length>6m == 0 0.19708 1.63832 0.120 0.9043
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)
residuals <- function(object, obs) obs-predict(object)
layout(matrix(1:2, ncol = 2))
qint <- ranef(BtheB_lmer1)$subject[["(Intercept)"]]
qres <- residuals(BtheB_lmer1, btb$bdi.pre)
qqnorm(qint, ylab = "Estimated random intercepts",
xlim = c(-3, 3), ylim = c(-20, 20),
main = "Random intercepts")
qqline(qint, col="red", lwd=3)
qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
ylab = "Estimated residuals",
main = "Residuals")
qqline(qres, col="red", lwd=3)
plot(BtheB_lm1,which=2)
Investigate and discuss whether there is any evidence of an interaction between treatment and time for the Beat the Blues data.
Answer - There is certainly an interaction between time and treatment type. The Beat the Blues treatment type shows a clear downward tren with lower variance as time progresses than a treatment as usual which indicates the Beat the Blues treatment does reduce Back Depression Inventory II as time progresses.
labels <- c('TAU'='Treatment as Usual',
'BtheB'='Beat The Blues')
BtheB %>%
gather(time,bdi, 5:8) %>%
mutate(time = factor(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
ggplot(aes(x=time,y=bdi)) +
geom_boxplot() +
facet_grid(.~treatment,labeller = as_labeller(labels)) +
labs(title='Beat the Blues ~ Treatment Type',
x='Time (in months)',
y='BDI')
The first output is the coefficient of the interaction between time and treatment in a linear model. The p value is significant at th 0.05% level which suggests that there is an interaction between time and treatment. The second output is the coefficients of a mixed effects model which indicates that the interaction between time and treatmetn is not significant for the repeated measures of subject.
btb <- BtheB %>%
gather(time,bdi, 5:8) %>%
mutate(time = as.numeric(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement='')))
BtheB_lm2 <- lm(bdi ~ bdi.pre + time + treatment + drug +
length + subject + treatment*time, data = btb)
d <- as.data.frame(summary(BtheB_lm2)$coefficients)
d[99,]
## Estimate Std. Error t value Pr(>|t|)
## time:treatmentBtheB 0.6435779 0.2975651 2.162814 0.03186724
BtheB_lmer2 <- lmer(bdi ~ bdi.pre + time + treatment + drug +
length + time*treatment +(1 | subject), data = btb,
REML = FALSE, na.action = na.omit)
cftest(BtheB_lmer2)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lmer(formula = bdi ~ bdi.pre + time + treatment + drug + length +
## time * treatment + (1 | subject), data = btb, REML = FALSE,
## na.action = na.omit)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) == 0 6.47824 2.31905 2.793 0.00521 **
## bdi.pre == 0 0.64046 0.07843 8.166 2.22e-16 ***
## time == 0 -0.95555 0.20831 -4.587 4.49e-06 ***
## treatmentBtheB == 0 -4.09804 1.98490 -2.065 0.03896 *
## drugYes == 0 -2.79209 1.73986 -1.605 0.10854
## length>6m == 0 0.21905 1.65043 0.133 0.89441
## time:treatmentBtheB == 0 0.49000 0.28959 1.692 0.09064 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)
residuals <- function(object, obs) obs-predict(object)
layout(matrix(1:2, ncol = 2))
qint <- ranef(BtheB_lmer2)$subject[["(Intercept)"]]
qres <- residuals(BtheB_lmer2, btb$bdi.pre)
qqnorm(qint, ylab = "Estimated random intercepts",
xlim = c(-3, 3), ylim = c(-20, 20),
main = "Random intercepts")
qqline(qint, col="red", lwd=3)
qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
ylab = "Estimated residuals",
main = "Residuals")
qqline(qres, col="red", lwd=3)
Construct a plot of the mean profiles of both treatment groups in the Beat the Blues data, showing also standard deviation bars at each time point. (Attempt to use ggplot2 library to do this).
#function for plotting the standard deviation, mean, and upper and lower values
min.mean.sd.max <- function(x) {
r <- c(min(x), mean(x) - sd(x), mean(x), mean(x) + sd(x), max(x))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
BtheB %>%
gather(time,bdi, 4:8) %>%
mutate(time = factor(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
ggplot(aes(x=time,y=bdi)) +
stat_summary(fun.data = min.mean.sd.max, geom = "boxplot")+
facet_grid(. ~ treatment,labeller = as_labeller(labels)) +
labs(title='Beat the Blues ~ Treatment Type',
x='Time (in months)',
y='BDI')
Consider the phosphate data from the package HSAUR3. This data shows the plasma inorganic phosphate levels for 33 subjects, 20 of whom are controls and 13 of whom have been classified as obese (Davis, 2002). Perform the following on this dataset
Construct boxplots by group and discuss.
Answer - The phosphate levels in the control group seem to drop and then rise again after 1 and a half hours with less variability than obese subjects. Additionally, the obese subjects phosphate levels did not rise until the third hour compared to the second hour for the control group.
data(phosphate)
phosphate %>%
gather(time,phosphate_level, 2:9) %>%
mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement=''))) %>%
ggplot(aes(x=time,y=phosphate_level)) +
geom_boxplot() +
facet_grid(.~group) +
labs(title='Phosphate Data ~ Group Phosphate Levels By Time',
x='Time (in hours)',
y='Phosphate Level')
Produce separate plots of the profiles of the individuals in each group.
phosphate$subject <- factor(rownames(phosphate))
phosphate %>%
gather(time,phosphate_level, 2:9) %>%
mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement=''))) %>%
ggplot(aes(x=time,y=phosphate_level)) +
geom_line(aes(group=subject,color=subject),size=1) +
facet_grid(.~group) +
labs(title='Phosphate Data ~ Group Phosphate Levels By Time & Subject',
x='Time (in hours)',
y='Phosphate Level')
Guided by how these plots fit, which linear mixed effects models do you think might be sensible? ( Hint: Discuss intercept/slope and intercept/ interaction ).
Answer - It would be sensible to fit a linear mixed effects model because the slopes and intercepts in each group vary widely. Additionally, there is an interaction within the group variable (control & obese) which should take effect on the time variable. Thus a model of the form \(lme(phosphate_level ~ t0 + time * group + (1|subject),data=phos)\) should be adequate.
Convert the data to long version and fit the model of your choice and discuss the results.
Answer - Judging by the coefficients of the model there significance in the interaction between group and time as suggested by the box plot. The random intercepts and residuals of the model seem to be centered around zero. Additionally, the mean squared error of the fixed-effects model is 0.151 which suggests the model has good prediction power overall for phosphate levels over time againsts both groups of subjects.
phosphate$subject <- factor(rownames(phosphate))
phos <- phosphate %>%
gather(time,phosphate_level, 3:9) %>%
mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement='')))
phos_lme <- lmer(phosphate_level ~ t0 + time * group + (1|subject),data=phos)
cftest(phos_lme)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lmer(formula = phosphate_level ~ t0 + time * group + (1 | subject),
## data = phos)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) == 0 0.88700 0.38907 2.280 0.022621 *
## t0 == 0 0.63084 0.08954 7.045 1.85e-12 ***
## time1 == 0 -0.55500 0.13424 -4.134 3.56e-05 ***
## time1.5 == 0 -0.72000 0.13424 -5.364 8.16e-08 ***
## time2 == 0 -0.50500 0.13424 -3.762 0.000169 ***
## time3 == 0 -0.20500 0.13424 -1.527 0.126731
## time4 == 0 0.06000 0.13424 0.447 0.654903
## time5 == 0 0.45500 0.13424 3.389 0.000700 ***
## groupobese == 0 0.40106 0.18965 2.115 0.034458 *
## time1:groupobese == 0 0.31654 0.21388 1.480 0.138874
## time1.5:groupobese == 0 0.18154 0.21388 0.849 0.395996
## time2:groupobese == 0 -0.38731 0.21388 -1.811 0.070159 .
## time3:groupobese == 0 -0.57192 0.21388 -2.674 0.007494 **
## time4:groupobese == 0 -0.58308 0.21388 -2.726 0.006407 **
## time5:groupobese == 0 -0.67038 0.21388 -3.134 0.001722 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)
summary(phos_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: phosphate_level ~ t0 + time * group + (1 | subject)
## Data: phos
##
## REML criterion at convergence: 330.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3420 -0.6072 0.0432 0.6218 3.2895
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 0.08565 0.2927
## Residual 0.18020 0.4245
## Number of obs: 231, groups: subject, 33
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.88700 0.38907 2.280
## t0 0.63084 0.08954 7.045
## time1 -0.55500 0.13424 -4.134
## time1.5 -0.72000 0.13424 -5.364
## time2 -0.50500 0.13424 -3.762
## time3 -0.20500 0.13424 -1.527
## time4 0.06000 0.13424 0.447
## time5 0.45500 0.13424 3.389
## groupobese 0.40106 0.18965 2.115
## time1:groupobese 0.31654 0.21388 1.480
## time1.5:groupobese 0.18154 0.21388 0.849
## time2:groupobese -0.38731 0.21388 -1.811
## time3:groupobese -0.57192 0.21388 -2.674
## time4:groupobese -0.58308 0.21388 -2.726
## time5:groupobese -0.67038 0.21388 -3.134
##
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
residuals <- function(object, obs) obs-predict(object)
layout(matrix(1:2, ncol = 2))
qint <- ranef(phos_lme)$subject[["(Intercept)"]]
qres <- residuals(phos_lme, phos$t0)
qqnorm(qint, ylab = "Estimated random intercepts",
xlim = c(-3, 3), ylim = c(-20, 20),
main = "Random intercepts")
qqline(qint, col="red", lwd=3)
qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
ylab = "Estimated residuals",
main = "Residuals")
qqline(qres, col="red", lwd=3)
Use the re.form argument to specifiy the mixed effect of \((1|subject)\).
phos.pred <- predict(phos_lme,newdata=phos,re.form=~(1|subject))
round(mean((phos$phosphate_level - phos.pred)^2), 3)
## [1] 0.151