Week_9_HW_kamryn

Prompt: This week we are interested in modeling social connection over time as a function of biologitcal sex for a new class of students. Let’s begin!

Per usual, let’s start by loading the necessary packages and data, and we are also going to instantiate a function that creates error bars for ggplot.

1. Let’s look at our data structure. What are the dimensions? How many subjects to we have? How many time points? Are there missing subjects at any time point?

#let's look at what columns we are working with as well as the dimensions of our df
dim(df)

## [1] 1422    8

colnames(df)

## [1] "X"                 "condition"         "age"              
## [4] "sex"               "time"              "positive.affect"  
## [7] "social.connection" "id"

unique(df$id)

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237

table(df$time)

## 
##   1   2   3   4   5   7 
## 237 237 237 237 237 237

Answer: The dimensions of our data set are 8 variables with 1422 rows. There are 237 subjects. There are 6 time points, but there should be 7, the way the data is labeled. We are missing data for all subjects in the 6th time point for both positive affect and social connection.

2. A critical first step in data analysis is plotting the data. This often reveals a tremendous amount about our data structure and can help inform what inferential statistics we decide to use. However, there are many ways to plot a given data set. Let’s look at a few different ways we can do this with our current longitudinal design. What plot do you find most intuitive and why?

# First, let's subtract 1 from time so 0 is meaningful. 
df$time <- df$time-1

# Plotting scores by group and timepoint
ggplot(df, aes(time, social.connection, color = sex)) +
  stat_summary(fun.y = mean, geom = "point", position = position_dodge(.5),
               size = 4) +
  stat_summary(fun.y = mean, geom = "line", position = position_dodge(.5)) +
  stat_summary(fun.data = FunctionForggplot, geom = "errorbar", 
               position = position_dodge(.5), width = .5, size = 1) +
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  scale_color_discrete(name = "Sex") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

## Warning: `fun.y` is deprecated. Use `fun` instead.

## Warning: `fun.y` is deprecated. Use `fun` instead.

# Loess curves for each group
ggplot(df, aes(time, social.connection, color = sex)) +
  geom_smooth() + 
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  scale_color_discrete(name = "Sex") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# Lines for each group
ggplot(df, aes(time, social.connection, color = sex)) +
  geom_smooth(method = "lm") + 
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  scale_color_discrete(name = "Sex") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

## `geom_smooth()` using formula 'y ~ x'

# Spaghetti plot with color for each person
ggplot(df, aes(time, social.connection, color = as.factor(id))) +
  stat_summary(fun.y = mean, geom = "line", position = position_dodge(.5)) +
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(), 
        legend.position = "none")

## Warning: `fun.y` is deprecated. Use `fun` instead.

# Spaghetti plot with color for each condition
ggplot(df, aes(time, social.connection, group = id, color = sex)) +
  stat_summary(fun.y = mean, geom = "line", position = position_dodge(.5)) +
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  scale_color_discrete(name = "Sex") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

## Warning: `fun.y` is deprecated. Use `fun` instead.

# Linear spaghetti plot with color for each person
ggplot(df, aes(time, social.connection, color = as.factor(id))) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(), 
        legend.position = "none")

## `geom_smooth()` using formula 'y ~ x'

###These are the intercepts and slopes that we model in MLM
# Linear spaghetti plot with color for each condition
ggplot(df, aes(time, social.connection, group = id, color = sex)) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_continuous(name = "Time", breaks = c(1:5, 7)-1) +
  scale_y_continuous(name = "Positive Affect") +
  scale_color_discrete(name = "Sex") +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(), 
        legend.position = "none")

## `geom_smooth()` using formula 'y ~ x'

Answer: I found the 1st and 3rd plots to be the most intuitive. The second plot did not work. These plots are intuitive because it shows a very clear change over time in the dependent variable across male and female participants. I really like the spaghetti plots, but they are very chaotic and hard to interpret.

3. Now that we have a sense of what our data looks like, let’s run a disaggragated analysis by regressing social connection on time interacting with sex. Disaggragated meaning that we are going to ignore any clustering in the data. What does this model tell us? Are any predictors significant? How does our model fit?

summary(lm(social.connection ~ time*sex, df))

## 
## Call:
## lm(formula = social.connection ~ time * sex, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0022 -0.3378 -0.0045  0.3160  2.0337 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0556421  0.0278153 109.855   <2e-16 ***
## time         -0.0127850  0.0083866  -1.524    0.128    
## sexMale      -0.0534265  0.0504652  -1.059    0.290    
## time:sexMale  0.0008141  0.0152158   0.054    0.957    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5204 on 1418 degrees of freedom
## Multiple R-squared:  0.004305,   Adjusted R-squared:  0.002198 
## F-statistic: 2.044 on 3 and 1418 DF,  p-value: 0.1059

Answer: This model is not a good fit. The F statistic is not significant, our multiple r and adjusted r values explain very little of the variance in social connection, and each of our IVs have little to do with explaining the variance of the DV, and show no significance.

4. One way to model nested data is with a 2 stage least square approach. This basically calculates an intercept and slope for each subject and then uses this to predict each level-2 (i.e. nested within) variable. We are going to use the wide df for this analysis, but it contains all the same info. Can all coefficients from the aggregated analyses be reproduced from the disaggregated analyses? Are the standard errors the same? Why?

#before calculating an intercept and slope for each subject, we need to instantiate a few columns so that we can append the model outputs to them. 

# Creating an id column in the wide data so that we know which row to put the intercepts and slopes in
df_Wide$id = 1:nrow(df_Wide)
# Also creating intercept and slope columns in the wide data
df_Wide$intercept = NA
# We have to use the wide data because we get the values (intercept and slope) are at the person level. 
df_Wide$slope = NA

# For every person:
for(id in unique(df$id)) {
  # Create data for that person
  person.data <- df[df$id == id, 
                               c("social.connection", "time")]
  # Predict positive affect from time for that person
  person.regression <- lm(social.connection ~ time, person.data)
  # 
  df_Wide[df_Wide$id == id, c("intercept", "slope")] <- 
    person.regression$coefficients
}
rm(id, person.data, person.regression) #this removes those objects that we created earlier to keep our global environment tidy. 

# Predicitng intercepts and slopes from level-2 variables
summary(lm(intercept ~ sex, df_Wide))

## 
## Call:
## lm(formula = intercept ~ sex, data = df_Wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97603 -0.28421 -0.01755  0.29302  1.41207 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.05564    0.03556  85.929   <2e-16 ***
## sexMale     -0.05343    0.06452  -0.828    0.408    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4568 on 235 degrees of freedom
## Multiple R-squared:  0.00291,    Adjusted R-squared:  -0.001333 
## F-statistic: 0.6858 on 1 and 235 DF,  p-value: 0.4085

summary(lm(slope ~ sex, df_Wide))

## 
## Call:
## lm(formula = slope ~ sex, data = df_Wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20864 -0.05946  0.00088  0.04850  0.23578 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.0127850  0.0066466  -1.924   0.0556 .
## sexMale      0.0008141  0.0120588   0.068   0.9462  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08538 on 235 degrees of freedom
## Multiple R-squared:  1.939e-05,  Adjusted R-squared:  -0.004236 
## F-statistic: 0.004558 on 1 and 235 DF,  p-value: 0.9462

# Compare to disaggregated analysis
summary(lm(social.connection ~ time*sex, df))

## 
## Call:
## lm(formula = social.connection ~ time * sex, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0022 -0.3378 -0.0045  0.3160  2.0337 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0556421  0.0278153 109.855   <2e-16 ***
## time         -0.0127850  0.0083866  -1.524    0.128    
## sexMale      -0.0534265  0.0504652  -1.059    0.290    
## time:sexMale  0.0008141  0.0152158   0.054    0.957    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5204 on 1418 degrees of freedom
## Multiple R-squared:  0.004305,   Adjusted R-squared:  0.002198 
## F-statistic: 2.044 on 3 and 1418 DF,  p-value: 0.1059

Answer: Yes, we can reproduce the results from the disaggregated analysis in the prior question. Our coefficients in the third model are equivalent to the coefficient in the disaggregated analysis. We also see the same degrees of freedom and residual standard error for the models.

5. Okay, now the moment you have all been waiting for….. analyzing this data as a mixed model. We are going to tackle this in a model-build up fashion, beginning with an unconditional cell-means model. What gammas (i.e. fixed effects) do we get from this model? How would you interpret it?

# Can be read as predicting positive affect from an intercept with intercepts as random / nested within id
#the (1|ID) is essentially the UsubIJ, whereas the 1+ is the null
unconditional.cell.means.model <- lmer(social.connection ~ 1 + (1|id), df, REML = FALSE)
summary(unconditional.cell.means.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ 1 + (1 | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1829.9   1845.7   -911.9   1823.9     1419 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3020 -0.6367 -0.0029  0.5289  3.7628 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 0.1104   0.3322  
##  Residual             0.1608   0.4011  
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   3.00598    0.02406 236.99997     125   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: The fixed effect we get is the intercept coefficient of 3.01. I would interpret this to mean that 3.001 is the value of Y when all X=0.

a Can we interpret the random effect of id, if so how would you?

Answer: I do not think the random effect of id is meaningful, and I would not interpret this because it is not significant.

b A useful piece of information from an unconditional cell-mean model is the ICC. How would you interpret the ICC?

# Intraclass correlation (ICC)
random.effect.info <- VarCorr(unconditional.cell.means.model)
level.2.residual.variance <- as.numeric(random.effect.info$id)
level.1.residual.variance <- attr(random.effect.info, "sc")^2

#Below is the ICC
level.2.residual.variance / (level.1.residual.variance + level.2.residual.variance)

## [1] 0.4069223

Answer: The intraclass correlation shows a positive correlation, of somewhat moderate strength. This suggests some potential collinearity between our level 1 and level 2 residual variance.

c This is not really a question so you don’t need to provide an answer, but this is a really cool function will provide you with the formula. It can be hard to translate your syntax to a model like Dan teaches it and this may be useful for you. You can read more about the package for mixed models in particular here: https://datalorax.github.io/equatiomatic/articles/lmer4-lmer.html

extract_eq(unconditional.cell.means.model, wrap = TRUE, intercept = "beta")

## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.3.2
## Current Matrix version is 1.2.18
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

## Registered S3 method overwritten by 'broom.mixed':
##   method      from 
##   tidy.gamlss broom

\[ \begin{aligned} \operatorname{social.connection}_{i} &\sim N \left(\alpha_{j[i]}, \sigma^2 \right) \\ \alpha_{j} &\sim N \left(\mu_{\alpha_{j}}, \sigma^2_{\alpha_{j}} \right) \text{, for id j = 1,} \dots \text{,J} \end{aligned} \]

6. let’s run a Means-as-outcomes model with a level 2 predictor (sex) and no level 1 predictor, and compare the output to a disaggregated and a 2 stage least square model. What is the same between these three and what is different? Why?

# Means-as-outcomes model #1 level 2 predictor, and no level 1 predictor. 
means.as.outcomes.model <- lmer(social.connection ~ sex + (1|id), df, REML = FALSE)

summary(means.as.outcomes.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ sex + (1 | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1830.9   1852.0   -911.5   1822.9     1418 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3093 -0.6449 -0.0070  0.5338  3.7555 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 0.1098   0.3314  
##  Residual             0.1608   0.4011  
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   3.02155    0.02877 236.99998 105.011   <2e-16 ***
## sexMale      -0.05126    0.05220 236.99997  -0.982    0.327    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## sexMale -0.551

#The sex male is gamma01

# Compare to disaggregated analysis. i.e. ignoring nesting 
summary(lm(social.connection ~ sex, df))

## 
## Call:
## lm(formula = social.connection ~ sex, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97029 -0.35488 -0.02155  0.31178  2.02971 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.02155    0.01655 182.618   <2e-16 ***
## sexMale     -0.05126    0.03002  -1.707    0.088 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5206 on 1420 degrees of freedom
## Multiple R-squared:  0.002049,   Adjusted R-squared:  0.001346 
## F-statistic: 2.915 on 1 and 1420 DF,  p-value: 0.08796

# Compare to aggregated analysis (2 stage least squares)
summary(lm(rowMeans(df_Wide[, grep("social\\.connection", colnames(df_Wide))]) ~ df_Wide$sex))

## 
## Call:
## lm(formula = rowMeans(df_Wide[, grep("social\\.connection", colnames(df_Wide))]) ~ 
##     df_Wide$sex)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.57710 -0.21599  0.02971  0.22415  1.19637 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.02155    0.02890 104.567   <2e-16 ***
## df_Wide$sexMale -0.05126    0.05243  -0.978    0.329    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3712 on 235 degrees of freedom
## Multiple R-squared:  0.004051,   Adjusted R-squared:  -0.0001871 
## F-statistic: 0.9559 on 1 and 235 DF,  p-value: 0.3292

Answer: The intercept of the fixed effects are the same as the intercepts in the other two models. The p values of each model is very different– in the first model, our fixed effect intercept is significant, but this value is not significant in the other models.

a Let’s compare the unconditional model to the means as outcome model, does the means as outcomes model account for more variance?

anova(unconditional.cell.means.model, means.as.outcomes.model)

## Data: df
## Models:
## unconditional.cell.means.model: social.connection ~ 1 + (1 | id)
## means.as.outcomes.model: social.connection ~ sex + (1 | id)
##                                npar    AIC    BIC  logLik deviance Chisq Df
## unconditional.cell.means.model    3 1829.9 1845.7 -911.95   1823.9         
## means.as.outcomes.model           4 1830.9 1852.0 -911.47   1822.9 0.962  1
##                                Pr(>Chisq)
## unconditional.cell.means.model           
## means.as.outcomes.model            0.3267

Answer: No, the means as outcomes model noes not account for more variance. One way to confirm that is because the BIC values (and deviance values) are nearly identical. A lower BIC value would indicate which model is the better fit.

8. Next, let’s estimate a random coefficient model, meaning a model with a level 1 predictor which in this case is time. What does this model indicate? How is it different from a disaggragated model with time?

# lmer is nice and will know if a variable is level 1 or level 2, so we don't have to specify that in the mdoel. 
random.coefficient.model <- lmer(social.connection ~ time + (time|id), df, REML = FALSE)
#we want time to vary by participants, or said differently trial within subjects. Hence the (time|ID)

summary(random.coefficient.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time + (time | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1827.0   1858.6   -907.5   1815.0     1416 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3376 -0.6165  0.0007  0.5526  3.5981 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  id       (Intercept) 0.1330269 0.36473       
##           time        0.0004574 0.02139  -0.61
##  Residual             0.1579721 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.039411   0.029589 236.980066  102.72   <2e-16 ***
## time         -0.012538   0.005522 236.931373   -2.27   0.0241 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.589

summary(lm(social.connection ~ time, df))

## 
## Call:
## lm(formula = social.connection ~ time, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0394 -0.3351 -0.0018  0.3190  1.9982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.03941    0.02322 130.917   <2e-16 ***
## time        -0.01254    0.00700  -1.791   0.0735 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5205 on 1420 degrees of freedom
## Multiple R-squared:  0.002254,   Adjusted R-squared:  0.001551 
## F-statistic: 3.208 on 1 and 1420 DF,  p-value: 0.07349

Answer: The standard error of time is different in the two models, where it is slightly bigger in the disaggregated model. The intercept coefficients are the same, however. In the disaggregated model, we do not have a significant model, but in the random coefficient model, we do have a significant predictor of time.

9. Next let’s run the full model, which is an intercepts and slopes as outcomes model. This has a level 2 predictor of a level 1 slope. Interpret the coefficients, how do they differ from a disaggregated and aggregated model?

# Intercepts and slopes as outcomes model. Level 2 predictor of a level 1 slope
intercepts.and.slopes.as.outcomes.model <-lmer(social.connection ~ time*sex + (time|id), df, REML = FALSE)
summary(intercepts.and.slopes.as.outcomes.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time * sex + (time | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1830.1   1872.2   -907.0   1814.1     1414 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3449 -0.6169  0.0005  0.5547  3.5918 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.132419 0.36389       
##           time        0.000457 0.02138  -0.61
##  Residual             0.157973 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.056e+00  3.541e-02  2.370e+02  86.292   <2e-16 ***
## time         -1.278e-02  6.618e-03  2.369e+02  -1.932   0.0546 .  
## sexMale      -5.343e-02  6.424e-02  2.370e+02  -0.832   0.4065    
## time:sexMale  8.141e-04  1.201e-02  2.369e+02   0.068   0.9460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   sexMal
## time        -0.590              
## sexMale     -0.551  0.325       
## time:sexMal  0.325 -0.551 -0.590

# Compare to disaggregated analysis
summary(lm(social.connection ~ time*sex, df))

## 
## Call:
## lm(formula = social.connection ~ time * sex, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0022 -0.3378 -0.0045  0.3160  2.0337 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0556421  0.0278153 109.855   <2e-16 ***
## time         -0.0127850  0.0083866  -1.524    0.128    
## sexMale      -0.0534265  0.0504652  -1.059    0.290    
## time:sexMale  0.0008141  0.0152158   0.054    0.957    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5204 on 1418 degrees of freedom
## Multiple R-squared:  0.004305,   Adjusted R-squared:  0.002198 
## F-statistic: 2.044 on 3 and 1418 DF,  p-value: 0.1059

# Compare to aggregated analysis (2 stage least squares)
summary(lm(intercept ~ sex, df_Wide))

## 
## Call:
## lm(formula = intercept ~ sex, data = df_Wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97603 -0.28421 -0.01755  0.29302  1.41207 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.05564    0.03556  85.929   <2e-16 ***
## sexMale     -0.05343    0.06452  -0.828    0.408    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4568 on 235 degrees of freedom
## Multiple R-squared:  0.00291,    Adjusted R-squared:  -0.001333 
## F-statistic: 0.6858 on 1 and 235 DF,  p-value: 0.4085

summary(lm(slope ~ sex, df_Wide))

## 
## Call:
## lm(formula = slope ~ sex, data = df_Wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20864 -0.05946  0.00088  0.04850  0.23578 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.0127850  0.0066466  -1.924   0.0556 .
## sexMale      0.0008141  0.0120588   0.068   0.9462  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08538 on 235 degrees of freedom
## Multiple R-squared:  1.939e-05,  Adjusted R-squared:  -0.004236 
## F-statistic: 0.004558 on 1 and 235 DF,  p-value: 0.9462

Answer: The coefficients in the mixed model tell us our intercept (3.056) and time are significant fixed effects, meaning this predictor is contant across participants. Sex is not a significant predictor, nor is there an effect between sex and time. In the disaggregated analysis, we do not see any significance regarding time, sex, or time*sex as meaningful predictors of social connection. Our intercept, however, is approximately the same in both of these models. In the outcomes as intercepts model, we again have the same intercept coefficient as the previous two models. In this case, our only predictor is of sex, and the regression coefficient for sex is very small, and insignificant. In the slopes as outcomes model, our intercept is still significant, however the value has now changed dramatically (-0.013). The predictor of sex is not significant and neither is the model overall.

a let’s use that package again to translate our syntax to the formula

extract_eq(intercepts.and.slopes.as.outcomes.model, wrap = TRUE, intercept = "beta")

\[ \begin{aligned} \operatorname{social.connection}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{time}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{sex}_{\operatorname{Male}}) \\ &\gamma^{\beta_{1}}_{0} + \gamma^{\beta_{1}}_{1}(\operatorname{sex}_{\operatorname{Male}}) \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for id j = 1,} \dots \text{,J} \end{aligned} \]

10. You may have wondered why we are specifying our models as Restricted Maximum Likelihood (i.e. REML = T) Vs. Full Maximum Likelihood (i.e. REML = F). The main difference is what happens to our fixed effects. What happens when we change the specification of the random coefficient model and intercept and slopes as outcomes models to Full maximum likelihood? Additionally, what happens when we try and run a model comparison with REML = T

random.coefficient.model <- 
  lmer(social.connection ~ time + (time|id), df, REML = FALSE)

summary(random.coefficient.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time + (time | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1827.0   1858.6   -907.5   1815.0     1416 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3376 -0.6165  0.0007  0.5526  3.5981 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  id       (Intercept) 0.1330269 0.36473       
##           time        0.0004574 0.02139  -0.61
##  Residual             0.1579721 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.039411   0.029589 236.980066  102.72   <2e-16 ***
## time         -0.012538   0.005522 236.931373   -2.27   0.0241 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.589

random.coefficient.model.rml <- 
  lmer(social.connection ~ time + (time|id), df, REML = TRUE)

summary(random.coefficient.model.rml)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: social.connection ~ time + (time | id)
##    Data: df
## 
## REML criterion at convergence: 1829.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3375 -0.6183  0.0012  0.5527  3.5941 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  id       (Intercept) 0.1339015 0.3659        
##           time        0.0004882 0.0221   -0.60
##  Residual             0.1579720 0.3975        
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.039411   0.029652 235.993997 102.504   <2e-16 ***
## time         -0.012538   0.005534 236.002435  -2.266   0.0244 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.589

intercepts.and.slopes.as.outcomes.model <- 
  lmer(social.connection ~ time*sex + (time|id), df, REML = FALSE)

summary(intercepts.and.slopes.as.outcomes.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time * sex + (time | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1830.1   1872.2   -907.0   1814.1     1414 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3449 -0.6169  0.0005  0.5547  3.5918 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.132419 0.36389       
##           time        0.000457 0.02138  -0.61
##  Residual             0.157973 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.056e+00  3.541e-02  2.370e+02  86.292   <2e-16 ***
## time         -1.278e-02  6.618e-03  2.369e+02  -1.932   0.0546 .  
## sexMale      -5.343e-02  6.424e-02  2.370e+02  -0.832   0.4065    
## time:sexMale  8.141e-04  1.201e-02  2.369e+02   0.068   0.9460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   sexMal
## time        -0.590              
## sexMale     -0.551  0.325       
## time:sexMal  0.325 -0.551 -0.590

intercepts.and.slopes.as.outcomes.model.rml <- 
  lmer(social.connection ~ time*sex + (time|id), df, REML = TRUE)

summary(intercepts.and.slopes.as.outcomes.model.rml)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: social.connection ~ time * sex + (time | id)
##    Data: df
## 
## REML criterion at convergence: 1839.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3448 -0.6174  0.0019  0.5567  3.5839 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  id       (Intercept) 0.1341748 0.36630       
##           time        0.0005189 0.02278  -0.59
##  Residual             0.1579728 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.056e+00  3.556e-02  2.350e+02  85.929   <2e-16 ***
## time         -1.278e-02  6.647e-03  2.350e+02  -1.924   0.0556 .  
## sexMale      -5.343e-02  6.452e-02  2.350e+02  -0.828   0.4085    
## time:sexMale  8.141e-04  1.206e-02  2.350e+02   0.068   0.9462    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   sexMal
## time        -0.590              
## sexMale     -0.551  0.325       
## time:sexMal  0.325 -0.551 -0.590

#comparing nested between REML = T and REML = F
anova(random.coefficient.model, intercepts.and.slopes.as.outcomes.model)

## Data: df
## Models:
## random.coefficient.model: social.connection ~ time + (time | id)
## intercepts.and.slopes.as.outcomes.model: social.connection ~ time * sex + (time | id)
##                                         npar    AIC    BIC  logLik deviance
## random.coefficient.model                   6 1827.0 1858.6 -907.52   1815.0
## intercepts.and.slopes.as.outcomes.model    8 1830.1 1872.2 -907.04   1814.1
##                                          Chisq Df Pr(>Chisq)
## random.coefficient.model                                    
## intercepts.and.slopes.as.outcomes.model 0.9639  2     0.6176

anova(random.coefficient.model.rml, intercepts.and.slopes.as.outcomes.model.rml)

## refitting model(s) with ML (instead of REML)

## Data: df
## Models:
## random.coefficient.model.rml: social.connection ~ time + (time | id)
## intercepts.and.slopes.as.outcomes.model.rml: social.connection ~ time * sex + (time | id)
##                                             npar    AIC    BIC  logLik deviance
## random.coefficient.model.rml                   6 1827.0 1858.6 -907.52   1815.0
## intercepts.and.slopes.as.outcomes.model.rml    8 1830.1 1872.2 -907.04   1814.1
##                                              Chisq Df Pr(>Chisq)
## random.coefficient.model.rml                                    
## intercepts.and.slopes.as.outcomes.model.rml 0.9639  2     0.6176

Answer: The fixed effect vary by.. . When we try to run a model comparison, there is the output of “refitting model(s) with ML (instead of REML)” which seems to un-do the REML = FALSE from those models. This would make our comparisons meaningless. The fixed effects across these models do not seem to differ in terms of their coefficients, but the p-values change depending on the model.

11. Finally, we often want to center our predictors at different values depending on the question at hand. How does our model change when we grand mean center our time variable?

# Grand Mean Centering
intercepts.and.slopes.as.outcomes.model <- lmer(social.connection ~ time*sex + (time|id), df, REML = FALSE)
summary(intercepts.and.slopes.as.outcomes.model)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time * sex + (time | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1830.1   1872.2   -907.0   1814.1     1414 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3449 -0.6169  0.0005  0.5547  3.5918 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.132419 0.36389       
##           time        0.000457 0.02138  -0.61
##  Residual             0.157973 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.056e+00  3.541e-02  2.370e+02  86.292   <2e-16 ***
## time         -1.278e-02  6.618e-03  2.369e+02  -1.932   0.0546 .  
## sexMale      -5.343e-02  6.424e-02  2.370e+02  -0.832   0.4065    
## time:sexMale  8.141e-04  1.201e-02  2.369e+02   0.068   0.9460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   sexMal
## time        -0.590              
## sexMale     -0.551  0.325       
## time:sexMal  0.325 -0.551 -0.590

#here we is where we center. 
df.time.grand.mean.centered <- df
df.time.grand.mean.centered$time <- df.time.grand.mean.centered$time - mean(df.time.grand.mean.centered$time)

intercepts.and.slopes.as.outcomes.model.time.grand.mean.centered <- lmer(social.connection ~ time*sex + (time|id), df.time.grand.mean.centered, REML = FALSE)
summary(intercepts.and.slopes.as.outcomes.model.time.grand.mean.centered)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: social.connection ~ time * sex + (time | id)
##    Data: df.time.grand.mean.centered
## 
##      AIC      BIC   logLik deviance df.resid 
##   1830.1   1872.2   -907.0   1814.1     1414 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3449 -0.6169  0.0005  0.5547  3.5918 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  id       (Intercept) 0.1102688 0.33207       
##           time        0.0004571 0.02138  -0.50
##  Residual             0.1579763 0.39746       
## Number of obs: 1422, groups:  id, 237
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   3.022e+00  2.877e-02  2.370e+02 105.014   <2e-16 ***
## time         -1.278e-02  6.618e-03  2.370e+02  -1.932   0.0546 .  
## sexMale      -5.126e-02  5.220e-02  2.370e+02  -0.982   0.3272    
## time:sexMale  8.141e-04  1.201e-02  2.370e+02   0.068   0.9460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   sexMal
## time        -0.113              
## sexMale     -0.551  0.062       
## time:sexMal  0.062 -0.551 -0.113

Answer: Our model does not change in terms of intercept and regression coefficient values or p-values.

Week_9_HW_kamryn_mattingly

Prompt: This week we are interested in modeling social connection over time as a function of biologitcal sex for a new class of students. Let’s begin!