ch7_models

Chapter 7 Models

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)

make_tea_data <- function(n_total = 48, sigma = 1.25, delta = 1) {
  n_half <- n_total / 2
  tibble(condition = c(rep("milk first", n_half), rep("tea first", n_half)),
         rating = c(round(rnorm(n_half, mean = 3.5 + delta, sd = sigma)), 
                    round(rnorm(n_half, mean = 3.5, sd = sigma)))) |>
    mutate(rating = if_else(rating > 10, 10, rating), # truncate
           rating = if_else(rating < 1, 1, rating))
}

7.1 Regression models

tea_data <- make_tea_data(n_total = 60)

mod <- lm(rating ~ condition, data = tea_data)

summary(mod)


Call:
lm(formula = rating ~ condition, data = tea_data)

Residuals:
   Min     1Q Median     3Q    Max 
  -2.6   -0.6    0.4    0.5    2.5 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          4.5000     0.2206  20.396   <2e-16 ***
conditiontea first  -0.9000     0.3120  -2.884   0.0055 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.208 on 58 degrees of freedom
Multiple R-squared:  0.1255,    Adjusted R-squared:  0.1104 
F-statistic:  8.32 on 1 and 58 DF,  p-value: 0.005495

7.2 Generalized linear models

tea_data$liked_tea <- ifelse(tea_data$rating >= 5, 1, 0)

modglm <- glm(liked_tea ~ condition, data = tea_data, family = "binomial")
summary(modglm)


Call:
glm(formula = liked_tea ~ condition, family = "binomial", data = tea_data)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)  
(Intercept)         8.287e-16  3.651e-01   0.000   1.0000  
conditiontea first -1.190e+00  5.654e-01  -2.104   0.0354 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 78.859  on 59  degrees of freedom
Residual deviance: 74.185  on 58  degrees of freedom
AIC: 78.185

Number of Fisher Scoring iterations: 4

7.3 Linear mixed effects models

library(lme4)

Loading required package: Matrix


Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

tea_data <- tea_data |>
  mutate(id = rep(1:30, times = 2))
mod_mix <- lmer(rating ~ condition + (1 | id), data = tea_data)

boundary (singular) fit: see help('isSingular')

summary(mod_mix)

Linear mixed model fit by REML ['lmerMod']
Formula: rating ~ condition + (1 | id)
   Data: tea_data

REML criterion at convergence: 193.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.1515 -0.4965  0.3310  0.4138  2.0688 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.00     0.000   
 Residual             1.46     1.208   
Number of obs: 60, groups:  id, 30

Fixed effects:
                   Estimate Std. Error t value
(Intercept)          4.5000     0.2206  20.396
conditiontea first  -0.9000     0.3120  -2.884

Correlation of Fixed Effects:
            (Intr)
condtntfrst -0.707
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

7.4 A worked example

repo <- "https://raw.githubusercontent.com/langcog/experimentology/main"
sgf <- read_csv(file.path(repo, "data/tidyverse/stiller_scales_data.csv"), show_col_types = FALSE) |>
  mutate(age_group = cut(age, 2:5, include.lowest = TRUE), 
         condition = condition |>
           fct_recode("Control" = "No Label", "Experimental" = "Label"))

sgf$age_centered <- scale(sgf$age, center = TRUE, scale = FALSE)
sgf$condition <- fct_relevel(sgf$condition, "Control")

modsgf <- glmer(correct ~ age_centered * condition + (1|subid) + (1|item),
             family = "binomial", data = sgf)
summary(modsgf)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: correct ~ age_centered * condition + (1 | subid) + (1 | item)
   Data: sgf

     AIC      BIC   logLik deviance df.resid 
   657.3    683.6   -322.7    645.3      582 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4195 -0.5688 -0.3756  0.6515  2.5418 

Random effects:
 Groups Name        Variance Std.Dev.
 subid  (Intercept) 0.05200  0.2280  
 item   (Intercept) 0.07483  0.2735  
Number of obs: 588, groups:  subid, 147; item, 4

Fixed effects:
                                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         -1.4582     0.2156  -6.762 1.36e-11 ***
age_centered                        -0.3767     0.1891  -1.992 0.046417 *  
conditionExperimental                2.2613     0.2246  10.069  < 2e-16 ***
age_centered:conditionExperimental   0.9237     0.2566   3.600 0.000318 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) ag_cnt cndtnE
age_centerd  0.180              
cndtnExprmn -0.617 -0.185       
ag_cntrd:cE -0.155 -0.743  0.208