Load Packages

library(tidyverse) # data management
library(haven)     # read in data files
library(sjPlot)    # regression tables
library(vtable)    # summary tables

## Warning: package 'vtable' was built under R version 4.1.3

library(lme4)      # estimate MLM
library(lmerTest)  # hypothesis testing for MLM

Load Data

This is an example data set presented in Willett (1988) and Singer and Willett (2003).

It is a fictitious example in which a random sample of 35 participants completed an “opposites-naming” task on 4 occasions. At each occasion the participant is asked to name the opposite of a presented word, and they complete as many such words as they can in a fixed time. They repeat the task on 4 separate occasions, each 1 week apart. We expect that their scores (number of opposites named) will increase across the 4 weeks as they gain practice with the task. There is also 1 person-level covariate measured at the initial time point, which is the score on a general cognitive ability test.

oppos <- read_dta(file = "opposites.dta")

Part 1: Descriptives and Unconditional Model

Question 1

Produce a table of descriptive statistics for the level-1 variables time, wave, and opp. How many total observations are there in the data? What is the difference between the “time” and “wave” variables?

There are a total of 35 participants with 4 observations across time each. This totals to 140 observations in this data set. Time and wave are on the same scale, but time is centered wave to make the zero mark more meaningful.

sumtable(oppos, vars = c("time", "wave", "opp"), digits = 2)

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 75	Max
time	140	1.5	1.12	0	0.75	2.25	3
wave	140	2.5	1.12	1	1.75	3.25	4
opp	140	204.81	46.62	76	171.5	231.25	310

Question 2

Produce a table of descriptive statistics for the level-2 variables “cog” and “ccog”. How many level-2 observations are there? What are these level-2 units? What is the difference between “cog” and “ccog”?

There are 140 level-2 observations for ccog and cog, 4 for each participant. Because ccog is grand mean centered, the mean is zero (SD = 12.4). Cog is, however, not mean centered and has a mean of 113.5 (SD = 12.4, Range: [81, 137]).

level2data <- oppos %>%
  group_by(id) %>%
  summarise(
    cog = mean(cog),
    ccog = mean(ccog)
  )

sumtable(oppos, vars = c("cog", "ccog"), digits = 1)

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 75	Max
cog	140	113.5	12.4	81	104	123	137
ccog	140	0	12.4	-32.5	-9.5	9.5	23.5

Question 3

Create one or more scatter plots to visualize the trajectories of opposites scores over time. Do you think summarizing these trajectories with a linear model is appropriate? Why or why not?

I think summarizing these trajectories with a linear model is appropriate because there doesn’t seem to be any obvious curvilinear change at you increase in time.

oppos %>%
  ggplot(aes(x = time, y = opp)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~id) +
  xlab("Time (in weeks)") +
  ylab("Opposite Naming Score") +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

oppos %>%
  ggplot(aes(x = time, y = opp)) +
  geom_line(aes(group = id), color = "grey", alpha = 0.75) +
  geom_smooth(method = "lm") +
  xlab("Time (in weeks)") +
  ylab("Opposite Naming Score") +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

Question 4

Estimate the unconditional means model (i.e. random intercept model) with “opp” as the outcome variable. What is the ICC? What does this tell you in this context?

ICC = 0.27

The ICC describes how much of the variance in the outcome, opp, is between vs. within subjects. In this case about 27% of variance in opp is due to between-time differences, and the remaining 73% is due to between-subject differences (within-time).

m4 <- lmer(opp ~ 1 + (1 | id), data = oppos, REML = F)
tab_model(m4)

	opposites naming score
Predictors	Estimates	CI	p
(Intercept)	204.81	194.49 – 215.13	<0.001
Random Effects
σ²	1583.72
τ₀₀ _id	574.31
ICC	0.27
N _id	35
Observations	140
Marginal R² / Conditional R²	0.000 / 0.266

Question 5

Write out the “level” equations for an unconditional linear growth model that allows initial scores and trends to vary across people. (No R code needed. A chunk is provided in case you want to import an image of your equations. See below.)

Level	Model 2
Level 1	\(opp_{ij} = \beta_{0j} + \beta_{1j}(time_{ij}) + e_{ij}\)
Level 2	\(\beta_{0j} = \gamma_{00} + u_{0j}\)
Level 2	\(\beta_{1j} = \gamma_{10} + \gamma_{11}(id_j) + u_{1j}\)

Question 6

Estimate the model in Question 5 using full maximum likelihood (FML; in lmer this is REML = FALSE). Create a table of the model parameter estimates. Interpret the two fixed effects parameters, each of the three variance component estimates, and the covariance estimate.

m5 <- lmer(opp ~ time + (time | id), data = oppos, REML = F)
tab_model(m5)

	opposites naming score
Predictors	Estimates	CI	p
(Intercept)	164.37	152.55 – 176.19	<0.001
time of observation (in weeks)	26.96	22.77 – 31.15	<0.001
Random Effects
σ²	159.48
τ₀₀ _id	1161.31
τ₁₁ _id.time	127.71
ρ₀₁ _id	-0.45
ICC	0.87
N _id	35
Observations	140
Marginal R² / Conditional R²	0.423 / 0.926

fixed effects components

\(\beta_{0j}\): Averaging across time points, the predicted opposing name score for subjects is 164.37, CI: [152.55, 176.19], p < .001.

\(\beta_{1j}\): As you move up one time point there is a predicted increase in opposing name score for subjects by 26.97 points, CI: [22.77, 31.15], p < .001.

variance component estimates

\(\sigma^2\) = 159.48 = the variance of residuals in observed opposing name score around individual regression lines (i.e., the scores predicted by the model)

\(\tau_{00.id}\) = 1161.31 = the variance of opposite scores at the first time point across students

\(\tau_{11 id.time}\) = 127.71 = the variance of the time slopes across individuals

correlation

\(\rho_{01 id}\) = -0.45 = correlation between the individual’s opposite task score at time = 0 and their time slope

Question 7

Create plots to assess the normality of the level-1 and level-2 residuals. Does assuming normality seem reasonable for these data?

Generally, yes we can assume normality for the level 1 and 2 residuals. However, I do worry a little about the level 2 participant intercept residuals as they peak seems tobe slightly above zero and there is a long left leaning tail.

hist(resid(m5)) #level 1 residuals

hist(ranef(m5)$id[,1]) #Level2 intercept residuals

hist(ranef(m5)$id[,2]) #Level2 slope residuals

head(ranef(m5)$id)

##   (Intercept)       time
## 1   32.506086   5.957349
## 2   51.466517   1.448539
## 3   -7.447833  16.817182
## 4   38.862161  -3.912873
## 5   29.850187 -13.604288
## 6    1.787951   2.131319

Part 2: Three Hypotheses

Three Hypotheses

Now, consider the following three hypotheses:

Researcher A claims that people will have different initial performance on the opposites task, but they will all learn at about the same rate (i.e., have the same practice effects), and neither initial performance nor learning rate are associated with general cognitive ability.
Researcher B claims that people with greater general cognitive ability will start with higher initial performance on the opposites task but people’s rate of learning (practice effects) will be varied and unpredictable.
Researcher C claims that people with greater general cognitive ability will start with higher initial performance on the opposites task and these people will also learn at a faster rate.

Question 8

Write out the level-1 and level-2 equations for a conditional growth model that you can use to decide which researcher you agree with. (No R code needed. A chunk is provided in case you want to import an image of your equations. See below.)

Level	Model 2
Level 1	\(opp_{ti} = \pi_{0i} + \pi_{1i}(time_{ti}) + e_{ti}\)
Level 2	\(\pi_{0i} = \beta_{00} + \beta_{01}(ccog_i) + u_{0i}\)
Level	\(\pi_{1i} = \beta_{10} + \beta_{11}(ccog_i) + u_{1i}\)

Question 9

Estimate the model from Question 8 and produce a table of parameter estimates.

m9 <- lmer(opp ~ time * ccog + (time | id), data = oppos, 
           control = lmerControl(optimizer = "Nelder_Mead"), REML = F)
tab_model(m9)

	opposites naming score
Predictors	Estimates	CI	p
(Intercept)	164.37	152.56 – 176.19	<0.001
time of observation (in weeks)	26.96	23.17 – 30.75	<0.001
grand mean centered cognitive skill score	-0.11	-1.07 – 0.85	0.817
time:ccog	0.43	0.12 – 0.74	0.006
Random Effects
σ²	159.48
τ₀₀ _id	1159.38
τ₁₁ _id.time	99.30
ρ₀₁ _id	-0.49
ICC	0.86
N _id	35
Observations	140
Marginal R² / Conditional R²	0.459 / 0.926

sqrt(159.48) #12.63

## [1] 12.62854

sqrt(1159.38) #34.05

## [1] 34.04967

sqrt(99.30) #9.96

## [1] 9.964939

Question 10

Which researcher do you agree with the most? Explain why based on the results of Question 9. Write a short paragraph that clearly explains your answer.

\(\beta_{0j}\): For average cognitive score and at time = 0, the average opposites task score across subjects is predicted to be 164.37, 95% CI: [152.56, 176.19], p < .001.

\(\beta_{1j}\): For average cognitive score, the predicted change in relationship between time and opposites task score is 26.96 increase in score for each unit increase in time, 95% CI: [23.17, 30.75], p < .001.

\(\beta_{2j}\): When time = 0, there is a predicted change in the relationship between cognitive score and opposites task score such that for each unit increase in cognitive skill score there is a predicted .11 unit decrease in opposites task score, 95% CI: [-1.07, 0.85], p = 0.817.

\(\beta_{3j}\): For each unit increase in time there is a .43 unit increase in the strength of the relationship between cognitive score and opposite task score, B = 0.43, 95% CI: [0.12, 0.74], p = .006

\(\sigma^2\) = sqrt(159.48) = 12.63 = the SD of residuals in observed opposing name score around individual regression lines (i.e., the scores predicted by the model)

\(\tau_{00.id}\) = sqrt(1159.38) = 34.05 = the SD of opposite scores at the first time point across students

\(\tau_{11 id.time}\) = sqrt(99.30) = 9.96 = the SD of the time slopes across individuals

\(\rho_{01 id}\) = -0.49 = correlation between the individual’s opposite task score at time = 0 and their time slope

PSET C

Dani Grant

2022-10-27