This is problem set #3, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills and some linear modeling.

rm(list=ls())
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
theme_set(theme_classic())

Part 1: Basic simulation and NHST

Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is rnorm). What’s the mean number of “significant” results?

First do this using a for loop.

p.vals <- array()
n.sims <- 10000

for(i in 1:n.sims) {
    ## get data
    s1 <- rnorm(30)
    s2 <- rnorm(30)
    ## do t-test
    t <- t.test(s1, s2)
    ## compile t-vals
    p.vals[i] <- as.numeric(t[3])
    ## get t.vals < .05 
    sig.p <- p.vals[p.vals <= .05]  
}

fp.rate <- length(sig.p) / n.sims 
fp.rate
## [1] 0.0475

Next, do this using the replicate function:

rep.p.vals <- replicate(n.sims, as.numeric(t.test(rnorm(30, rnorm(30)))[3]))
rep.sig.p <- rep.p.vals[rep.p.vals <= .05]
fp.rate <- length(rep.sig.p) / n.sims 
fp.rate
## [1] 0.051

Ok, that was a bit boring. Let’s try something moderately more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).

Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether their performance is above chance. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.

First, write a function that implements this sampling regime.

## returns vector of significant p.vals
double.sample <- function () {
    ## collect some data
    s.t1 <- rnorm(30)
    ## get p.val
    p.val <- t.test(s.t1)$p.val
    ## check p.val
    if(p.val < .25 & p.val > .05) {
        ## double the sample
        s.t2 <- c(s.t1, rnorm(30))
        ## check p.val again
        p.val <- t.test(s.t2)$p.value
    } 
  return(p.val)
}

Now call this function 10k times and find out what happens.

dbl.p.vals <- replicate(10000, double.sample())
sig.dbl.p.vals <- dbl.p.vals[dbl.p.vals <= .05]
fp.rate.2 <- length(sig.dbl.p.vals) / length(dbl.p.vals) 
fp.rate.2
## [1] 0.071

Is there an inflation of false positives? How bad is it?

Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. Let’s see what happens when you double the sample ANY time p > .05 (not just when p < .25), or when you do it only if p < .5 or < .75. How do these choices affect the false positive rate?

HINT: Try to do this by making the function double.sample take the upper p value as an argument, so that you can pass this through dplyr.

HINT 2: You may need more samples. Find out by looking at how the results change from run to run.

double.sample.2 <- function (upper.p.val, n.subs) {
    ## collect some data
    s.t1 <- rnorm(n.subs)
    ## get p.val
    p.val <- t.test(s.t1)$p.val
    ## check p.val
    if(p.val < upper.p.val & p.val > .05) {
        ## double the sample
        s.t2 <- c(s.t1, rnorm(n.subs))
        ## check p.val again
        p.val <- t.test(s.t2)$p.value
    } 
  return(p.val)
}

## sim values
n.subs <- 30
upper.p.1 <- 1.0
upper.p.75 <- .75
upper.p.5 <- .5

## run sims
dbl.1.p.vals <- replicate(10000, double.sample.2(upper.p.1, n.subs))
dbl.75.p.vals <- replicate(10000, double.sample.2(upper.p.75, n.subs))
dbl.5.p.vals <- replicate(10000, double.sample.2(upper.p.5, n.subs))

# get only sig p.vals
dbl.1.sig.p <- dbl.1.p.vals[dbl.1.p.vals <= .05]
dbl.75.sig.p <- dbl.75.p.vals[dbl.75.p.vals <= .05]
dbl.5.sig.p <- dbl.5.p.vals[dbl.5.p.vals <= .05]

## get fp rates
dbl.1.fp.rate <- length(dbl.1.sig.p) / length(dbl.1.p.vals)
dbl.75.fp.rate <- length(dbl.75.sig.p) / length(dbl.75.p.vals)
dbl.5.fp.rate <- length(dbl.5.sig.p) / length(dbl.5.p.vals)

# join together and plot
fp <- c(.05, .75, 1.0)
fp_rate <- c(dbl.5.fp.rate, dbl.75.fp.rate, dbl.1.fp.rate)
fp_df <- data.frame(fp, fp_rate)

qplot(x=as.factor(fp), y=fp_rate, data=fp_df,
      geom="bar", stat="identity", ) + 
    coord_cartesian(ylim = c(0.045, 0.10)) + 
    geom_hline(yintercept=0.05, color="red")

What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?

This sampling policy does increase your false positive rate, but not by all that much. It is interesting that increasing the upper p-value does not have a large affect on the false positive rate.

Part 2: The Linear Model

2A: Basic Linear Modeling

Let’s use the ToothGrowth dataset, on guineapig teeth based on orange juice and vitamin C. This is super simple. (Try ?ToothGrowth).

First plot the data, we’ll use qplot to understand how len (tooth length) depends on dose (amount of Vitamin C) and supp (delivery method).

qplot(dose, len, facets = ~ supp, data=ToothGrowth) 

So now you see what’s going on.

Next, always make a histogram of the DV before making a linear model! This reveals the distribution and can be helpful in choosing your model type.

qplot(len, data=ToothGrowth)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Now make a linear model of tooth lengths using lm. Try making one with main effects and interactions and another with just main effects. Make sure to assign them to variables so that you can get them later.

m1 <- lm(len ~ dose + supp, data=ToothGrowth)
summary(m1)
## 
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.600 -3.700  0.373  2.116  8.800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.2725     1.2824   7.231 1.31e-09 ***
## dose          9.7636     0.8768  11.135 6.31e-16 ***
## suppVC       -3.7000     1.0936  -3.383   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared:  0.7038, Adjusted R-squared:  0.6934 
## F-statistic: 67.72 on 2 and 57 DF,  p-value: 8.716e-16
m2 <- lm(len ~ dose * supp, data=ToothGrowth)
summary(m2)
## 
## Call:
## lm(formula = len ~ dose * supp, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8462  0.0504  2.2893  7.9386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.550      1.581   7.304 1.09e-09 ***
## dose           7.811      1.195   6.534 2.03e-08 ***
## suppVC        -8.255      2.236  -3.691 0.000507 ***
## dose:suppVC    3.904      1.691   2.309 0.024631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.7296, Adjusted R-squared:  0.7151 
## F-statistic: 50.36 on 3 and 56 DF,  p-value: 6.521e-16

Now try taking out the intercept, using a -1 term in the formula. what does this do?

m3 <- lm(len ~ dose * supp - 1, data = ToothGrowth)
summary(m3)
## 
## Call:
## lm(formula = len ~ dose * supp - 1, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8463  0.0504  2.2893  7.9386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## dose           7.811      1.195   6.534 2.03e-08 ***
## suppOJ        11.550      1.581   7.304 1.09e-09 ***
## suppVC         3.295      1.581   2.084   0.0418 *  
## dose:suppVC    3.904      1.691   2.309   0.0246 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.9622, Adjusted R-squared:  0.9595 
## F-statistic: 356.2 on 4 and 56 DF,  p-value: < 2.2e-16
anova(m2,m3)
## Analysis of Variance Table
## 
## Model 1: len ~ dose * supp
## Model 2: len ~ dose * supp - 1
##   Res.Df    RSS Df  Sum of Sq F Pr(>F)
## 1     56 933.63                       
## 2     56 933.63  0 5.6843e-13

Removing the intercept from the model allows you to see the intercept of the OJ supplement condition in the model output. It doesn’t change any of the coefficients.

Thought question: Take a moment to interpret the coefficients of the model. Q1 - What are the units? * Dose is measured in milligrams * Length is measured in millimeters (?) * Supplement is a categorial variable - presence of OJ or absorbic acid

Q2 - How does the interaction relate to the plot? * From the plot you can see that the size of the dose has a different effect depending on the delivery method. That is, lower doses of Vitamin C have a larger effect on tooth length if administered with OJ. Larger doses of Vitaminc C appear to have the same effect on tooth length.

Q3 - Should there be an interaction in the model? What does it mean? How important is it? * Yes, an interaction term is important in this model since it has a reasonable interpretation: that the effect of dose depends on the different delivery methods such that lower doses are not as effective with the VC delivery.

Now make predictions from the model you like the best. What should happen with doses of 0, 1.5, 2.5, and 10 under both supplements?

HINT: use the predict function …

HINT 2: you will have to make a dataframe to do the prediction with, so use something like data.frame(dose=...).

dose.df <- data.frame(dose=c(0, 1.5, 2.5, 10), supp="VC")
dose.df <- rbind(dose.df, data.frame(dose=c(0, 1.5, 2.5, 10), supp="OJ"))
preds.df <- cbind(dose.df, m.pred = predict(m3, dose.df))

Plot the model predictions

qplot(dose, m.pred, facets=(~supp), data=preds.df,
      geom=c("point", "line")) 

Now plot the residuals from the original model. How do they look? HINT: ?resid

# get residuals
plot(resid(m3))

BONUS: test them for normality of distribution using a quantile-quantile plot.

HINT: ?qqplot and ?qqnorm

m3.res <- resid(m3)
qqnorm(m3.res); qqline(m3.res)

The points in the qqplot fall on the diagonal line, providing evidence that our residuals are normally distributed.

2B: Exploratory Linear Modeling

What the heck is going on? Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (Faces_Medium, in which kids played on a white background, and Faces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). Forgive our bad naming conventions.

Try to figure out what the most reasonable linear model of the data is.

d <- read.csv("../data/FVS2011-hands.csv")

Histogram of the DV

qplot(hand.look, data=d)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Looking time data is not normally distributed. Lower limit = 0. Looks more like a Poisson distrubtion?

Now plot the data by condition

qplot(condition, hand.look, data=d, 
      geom="boxplot", facet="condition")

Now plot the relationship between looking to hands and age by condition.

qplot(age, hand.look, facets=(~condition), data=d) + geom_smooth(method="lm")

Model looking to hand as a function of condition.

m1.hands <- lm(hand.look ~ condition, data=d)
summary(m1.hands)
## 
## Call:
## lm(formula = hand.look ~ condition, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08169 -0.04172 -0.01124  0.03625  0.26391 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.061309   0.005513  11.121  < 2e-16 ***
## conditionFaces_Plus 0.020377   0.007796   2.614  0.00955 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05938 on 230 degrees of freedom
## Multiple R-squared:  0.02884,    Adjusted R-squared:  0.02462 
## F-statistic: 6.831 on 1 and 230 DF,  p-value: 0.009552
## add interaction term (age by condition) to the model
m2.hands <- lm(hand.look ~ condition * age - 1, data=d)
summary(m2.hands)
## 
## Call:
## lm(formula = hand.look ~ condition * age - 1, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.190121 -0.036508 -0.001239  0.032091  0.239474 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)   
## conditionFaces_Medium    0.022394   0.015417   1.453  0.14772   
## conditionFaces_Plus     -0.006046   0.015242  -0.397  0.69198   
## age                      0.003143   0.001176   2.673  0.00807 **
## conditionFaces_Plus:age  0.003950   0.001653   2.389  0.01770 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05456 on 228 degrees of freedom
## Multiple R-squared:  0.6641, Adjusted R-squared:  0.6583 
## F-statistic: 112.7 on 4 and 228 DF,  p-value: < 2.2e-16

Plot that model on the same plot as the data.

HINT: you can do this either using predict or (if you are feeling confident of your understanding of the models) using the built-in linear models in ggplot’s geom_smooth.

qplot(x=age, y=hand.look, color = condition, data=d) +
    geom_smooth(method="lm")

What do you conclude from this pattern of data?

Older children looked more to hands, and especially in the Faces_Medium condition when they saw kids playing on a white background.

3: Linear Mixed Effect Models

The goal here is to learn to use LMEMs using lme4 and to compare them to standard by subject, by item LMs, as well as the standard (no repeated measures) fixed effects GLM.

The dataset here is from Stiller, Goodman, & Frank (2014), a paper on children’s pragmatic inferences. We saw the paradigm in the counterbalancing lecture: it’s three faces: a smiley, a smiley with glasses, and a smiley with a hat and glasses. When told “my friend has glasses” do kids pick the one with the glasses and no hat? age.group is the kids’ age group, condition is either “label,” described above, or “no label,” which was a control condition in which kids picked without hearing the term “glasses” at all.

d <- read.csv("../data/scales.csv")
d$age.group <- factor(d$age.group)

Always begin with a histogram!

qplot(factor(correct), facets=(~age.group), data=d)

Brief Confidence Interval Digression

Start out by setting up a function for a 95% CI using the normal approximation.

# create formula to get standard error of the mean
sem <- function(x) {
    sd(x,na.rm=TRUE) / sqrt(length(x))
}


ci95.norm <- function(x) {
    z <- qnorm(0.975)
    ci_high = mean(x, na.rm=T) + z * sem(x)
    ci_low = mean(x, na.rm=T) - z * sem(x)
    c(ci_low = ci_low, ci_high = ci_high)
}

But the number of participants in a group is likely to be < 30, so let’s also compute this with a t distribution.

ci95.t <- function(x) {
    t <- qt(0.975, df=length(x)-1)
     ci_high = mean(x, na.rm=T) + t * sem(x)
    ci_low = mean(x, na.rm=T) - t * sem(x)
    c(ci_low = ci_low, ci_high = ci_high)
}

On the other hand, maybe we should use bootstrap CIs because these are actually proportions, and the normal/t approximations don’t know that they are 0/1 bounded.

library(boot)
## Warning: package 'boot' was built under R version 3.1.3
library(bootstrap)

Take a look at ?boot and ?bootci. Note that the syntax for the boot library is terrible, so we’re going to use it to check some code that I use:

theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}

ci.low <- function(x,na.rm=T) {
  mean(x,na.rm=na.rm) - 
    quantile(bootstrap(1:length(x),
                       10000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)
  }

ci.high <- function(x,na.rm=T) {
  quantile(bootstrap(1:length(x),
                     10000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) - 
    mean(x,na.rm=na.rm)
  }

Now make 95% confidence intervals across participants using all the methods above:

  • Normal
  • t
  • Bootstrap percentile using boot.ci
  • Bootstrap percentile using my code
ci_95_n <- ci95.norm(d$correct)
ci_95_t <- ci95.t(d$correct)

# couldn't figure out how to use boot.ci

ci_bootstrap <- c(ci_low = mean(d$correct, na.rm=T) - ci.low(d$correct), 
                  ci_high = mean(d$correct, na.rm=T) + ci.high(d$correct))

Now plot the data by age and condition using dplyr and ggplot2. Plot some CIs on here - extra credit if you plot all of them and compare visually (you’ll need position = position_dodge() or some other way to offset them).

ms <- d %>%
    group_by(age.group, condition) %>%
    summarise(mean_accuracy = mean(correct, na.rm=T),
              ci_low_95_n = ci95.norm(correct)[1],
              ci_high_95_n = ci95.norm(correct)[2],
              ci_low_95_t = ci95.t(correct)[1],
              ci_high_95_t = ci95.t(correct)[2],
              ci_low_boot = ci.low(correct),
              ci_high_boot = ci.high(correct))

Now plot

qplot(x=age.group, y=mean_accuracy, data=ms, geom="bar", stat="identity",
      fill = condition, position = position_dodge()) +
    geom_linerange(aes(ymin=mean_accuracy - ci_low_boot, 
                       ymax=mean_accuracy + ci_high_boot), 
                   position = position_dodge(1.1)) +
     geom_linerange(aes(ymin=mean_accuracy - ci_low_95_n, 
                       ymax=mean_accuracy + ci_high_95_n), 
                   position = position_dodge(0.9),
                   color = "blue") +
     geom_linerange(aes(ymin=mean_accuracy - ci_low_95_t, 
                       ymax=mean_accuracy + ci_high_95_t), 
                   position = position_dodge(0.7),
                   color = "red") + 
        scale_fill_grey(start=0.3, end=0.6) 

What do you conclude about confidence interval computation?

Bootstrapping our CIs helps. But I think I might have done something wrong in calculating the CI without bootstrap. They seem awfully wide.

Back to LMEMs

library(lme4)
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 3.1.3
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:base':
## 
##     crossprod, tcrossprod
## 
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 3.1.3

OK, now do a basic GLM over the entire data frame, using age.group, condition, and their interaction to predict correctness. (If we were focusing on developmental issues, I would ask you to think about how to model age here, but let’s treat it as three discrete groups for now).

NOTE: this model is not appropriate, because it assumes that each subject’s observations are independent from one another. It’s still fine to do the analysis, though: it can tell you a lot about the data and is easy and fast to fit, as long as you know that you can’t trust the p-values!

m1.glm <- glm(correct ~ age.group * condition, family = "binomial", data=d)
summary(m1.glm)
## 
## Call:
## glm(formula = correct ~ age.group * condition, family = "binomial", 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7166  -0.7401  -0.5168   0.8250   2.0393  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2819     0.2020   1.395 0.162902    
## age.group3                     0.6210     0.2961   2.098 0.035935 *  
## age.group4                     0.9312     0.3159   2.948 0.003198 ** 
## conditionNo Label             -1.4368     0.3130  -4.590 4.43e-06 ***
## age.group3:conditionNo Label  -0.6210     0.4494  -1.382 0.167050    
## age.group4:conditionNo Label  -1.7221     0.5022  -3.429 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 808.59  on 587  degrees of freedom
## Residual deviance: 648.74  on 582  degrees of freedom
## AIC: 660.74
## 
## Number of Fisher Scoring iterations: 4

Let’s now use dplyr to get data frames for by-items (msi) and by-subjects (mss) analyses. msi should contain the mean ratings for every item and mss should contain the mean ratings for every subject.

msi <- d %>%
    group_by(trial, condition, age.group) %>%
    summarise(mean_accuracy = mean(correct),
              ci_low = ci.low(correct),
              ci_high = ci.high(correct))

mss <- d %>% 
    group_by(subid, condition, age.group) %>%
    summarise(mean_accuracy = mean(correct),
              ci_low = ci.low(correct),
              ci_high = ci.high(correct))

Now do standard linear models on each of these.

NOTE: These are not strictly correct either because of the normal approximation on percent correct (model doesn’t know it’s 0 - 1 bounded and could give you standard error that goes above 1). Again, useful to do and see what happens.

m1.lm <- lm(mean_accuracy ~ trial * age.group, data=msi)
summary(m1.lm)
## 
## Call:
## lm(formula = mean_accuracy ~ trial * age.group, data = msi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3990 -0.2172  0.0000  0.2172  0.3990 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)             0.4458333  0.2585321   1.724    0.110
## trialfaces             -0.0008333  0.3656196  -0.002    0.998
## trialhouses            -0.1225000  0.3656196  -0.335    0.743
## trialpasta             -0.0408333  0.3656196  -0.112    0.913
## age.group3              0.0782051  0.3656196   0.214    0.834
## age.group4              0.0958333  0.3656196   0.262    0.798
## trialfaces:age.group3  -0.1113462  0.5170641  -0.215    0.833
## trialhouses:age.group3  0.0151282  0.5170641   0.029    0.977
## trialpasta:age.group3   0.0664744  0.5170641   0.129    0.900
## trialfaces:age.group4  -0.1450000  0.5170641  -0.280    0.784
## trialhouses:age.group4 -0.0858333  0.5170641  -0.166    0.871
## trialpasta:age.group4   0.0200000  0.5170641   0.039    0.970
## 
## Residual standard error: 0.3656 on 12 degrees of freedom
## Multiple R-squared:  0.07543,    Adjusted R-squared:  -0.7721 
## F-statistic: 0.08899 on 11 and 12 DF,  p-value: 0.9998
m2.lm <- lm(mean_accuracy ~ condition * age.group, data=mss)
summary(m2.lm)
## 
## Call:
## lm(formula = mean_accuracy ~ condition * age.group, data = mss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57000 -0.21154  0.01042  0.18000  0.51042 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.57000    0.04420  12.897  < 2e-16 ***
## conditionNo Label            -0.33042    0.06315  -5.232 5.95e-07 ***
## age.group3                    0.14154    0.06190   2.287 0.023714 *  
## age.group4                    0.20083    0.06315   3.180 0.001810 ** 
## conditionNo Label:age.group3 -0.14154    0.08889  -1.592 0.113560    
## conditionNo Label:age.group4 -0.31542    0.08977  -3.514 0.000594 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.221 on 141 degrees of freedom
## Multiple R-squared:  0.5737, Adjusted R-squared:  0.5586 
## F-statistic: 37.96 on 5 and 141 DF,  p-value: < 2.2e-16

Do ANOVA on these. Note that ANOVA doesn’t let you figure out what is going on with individual levels of age.

On to linear mixed effect models. Create the maximal random effects model a la Barr et al. (2013). Does it converge? If not, what will you do to make it converge? (The internet can be your friend here).

HINT: try simplifying your model to a “semi-maximal” model. Bonus: try using a different fitting procedure on the maximal model.

HINT: make sure that you consider which random effects are appropriate. Consider which observations are within/between subjects. E.g. having a random coefficient for age by subject doesn’t make sense, because each subject has only one age.

m1.lmer <- glmer(correct ~ age.group * trial * condition + (trial|subid), 
                 family="binomial", nAGQ = 0, data=d)
summary(m1.lmer)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 0) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ age.group * trial * condition + (trial | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    679.1    827.9   -305.5    611.1      554 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2415 -0.5456 -0.2931  0.6571  3.0940 
## 
## Random effects:
##  Groups Name        Variance Std.Dev. Corr             
##  subid  (Intercept) 0.6502   0.8064                    
##         trialfaces  0.8824   0.9394   -0.88            
##         trialhouses 1.3681   1.1697   -0.90  1.00      
##         trialpasta  2.7986   1.6729   -0.97  0.97  0.98
## Number of obs: 588, groups:  subid, 147
## 
## Fixed effects:
##                                            Estimate Std. Error z value
## (Intercept)                                 0.41744    0.44469   0.939
## age.group3                                  2.09405    0.87599   2.390
## age.group4                                  1.56656    0.78205   2.003
## trialfaces                                  0.15993    0.61745   0.259
## trialhouses                                -0.49810    0.62293  -0.800
## trialpasta                                 -0.16668    0.67217  -0.248
## conditionNo Label                          -1.33316    0.65773  -2.027
## age.group3:trialfaces                      -2.19843    1.05988  -2.074
## age.group4:trialfaces                      -1.25161    1.00377  -1.247
## age.group3:trialhouses                     -2.01415    1.06830  -1.885
## age.group4:trialhouses                     -0.78421    1.01077  -0.776
## age.group3:trialpasta                      -0.86681    1.16533  -0.744
## age.group4:trialpasta                      -0.15114    1.12376  -0.134
## age.group3:conditionNo Label               -3.16570    1.19001  -2.660
## age.group4:conditionNo Label               -2.01968    1.06393  -1.898
## trialfaces:conditionNo Label               -0.34725    0.92131  -0.377
## trialhouses:conditionNo Label              -0.21202    0.97726  -0.217
## trialpasta:conditionNo Label               -0.06353    1.00345  -0.063
## age.group3:trialfaces:conditionNo Label     3.02924    1.50619   2.011
## age.group4:trialfaces:conditionNo Label     0.40700    1.52187   0.267
## age.group3:trialhouses:conditionNo Label    4.00772    1.53056   2.618
## age.group4:trialhouses:conditionNo Label  -15.63768 1278.68315  -0.012
## age.group3:trialpasta:conditionNo Label     2.15713    1.62072   1.331
## age.group4:trialpasta:conditionNo Label     0.37043    1.56585   0.237
##                                          Pr(>|z|)   
## (Intercept)                               0.34788   
## age.group3                                0.01682 * 
## age.group4                                0.04516 * 
## trialfaces                                0.79563   
## trialhouses                               0.42394   
## trialpasta                                0.80416   
## conditionNo Label                         0.04267 * 
## age.group3:trialfaces                     0.03806 * 
## age.group4:trialfaces                     0.21243   
## age.group3:trialhouses                    0.05938 . 
## age.group4:trialhouses                    0.43783   
## age.group3:trialpasta                     0.45698   
## age.group4:trialpasta                     0.89301   
## age.group3:conditionNo Label              0.00781 **
## age.group4:conditionNo Label              0.05765 . 
## trialfaces:conditionNo Label              0.70624   
## trialhouses:conditionNo Label             0.82825   
## trialpasta:conditionNo Label              0.94952   
## age.group3:trialfaces:conditionNo Label   0.04430 * 
## age.group4:trialfaces:conditionNo Label   0.78914   
## age.group3:trialhouses:conditionNo Label  0.00883 **
## age.group4:trialhouses:conditionNo Label  0.99024   
## age.group3:trialpasta:conditionNo Label   0.18320   
## age.group4:trialpasta:conditionNo Label   0.81299   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 24 > 20.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

How do these coefficients compare with the independent coefficients linear model? What do you conclude?

Which random effects make the most difference? Find out using ranef. Plot the random effects for subject and item.

plot(ranef(m1.lmer))
## $subid

Make the minimal random effects model with just a subject intecept. How does this compare?

m2.lmer <- glmer(correct ~ age.group * trial * condition + (1|subid), 
                 family="binomial", nAGQ = 0, data=d)
summary(m2.lmer)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 0) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ age.group * trial * condition + (1 | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    664.0    773.4   -307.0    614.0      563 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3325 -0.6065 -0.2940  0.7095  3.2372 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subid  (Intercept) 0.1031   0.3211  
## Number of obs: 588, groups:  subid, 147
## 
## Fixed effects:
##                                            Estimate Std. Error z value
## (Intercept)                                 0.40611    0.41360   0.982
## age.group3                                  2.08263    0.84680   2.459
## age.group4                                  1.54269    0.74609   2.068
## trialfaces                                  0.17017    0.58380   0.292
## trialhouses                                -0.48631    0.57224  -0.850
## trialpasta                                 -0.16457    0.57406  -0.287
## conditionNo Label                          -1.29461    0.61423  -2.108
## age.group3:trialfaces                      -2.18790    1.02260  -2.140
## age.group4:trialfaces                      -1.23009    0.96131  -1.280
## age.group3:trialhouses                     -2.00246    1.01178  -1.979
## age.group4:trialhouses                     -0.76808    0.94687  -0.811
## age.group3:trialpasta                      -0.88632    1.05808  -0.838
## age.group4:trialpasta                      -0.17223    1.00556  -0.171
## age.group3:conditionNo Label               -3.14367    1.14410  -2.748
## age.group4:conditionNo Label               -1.98991    1.00991  -1.970
## trialfaces:conditionNo Label               -0.38172    0.87476  -0.436
## trialhouses:conditionNo Label              -0.23654    0.91084  -0.260
## trialpasta:conditionNo Label               -0.04698    0.86829  -0.054
## age.group3:trialfaces:conditionNo Label     3.01118    1.45069   2.076
## age.group4:trialfaces:conditionNo Label     0.37843    1.46519   0.258
## age.group3:trialhouses:conditionNo Label    3.98013    1.44746   2.750
## age.group4:trialhouses:conditionNo Label  -15.66974 1285.05957  -0.012
## age.group3:trialpasta:conditionNo Label     2.15812    1.45855   1.480
## age.group4:trialpasta:conditionNo Label     0.38378    1.39320   0.276
##                                          Pr(>|z|)   
## (Intercept)                               0.32615   
## age.group3                                0.01392 * 
## age.group4                                0.03867 * 
## trialfaces                                0.77068   
## trialhouses                               0.39542   
## trialpasta                                0.77436   
## conditionNo Label                         0.03506 * 
## age.group3:trialfaces                     0.03239 * 
## age.group4:trialfaces                     0.20069   
## age.group3:trialhouses                    0.04780 * 
## age.group4:trialhouses                    0.41726   
## age.group3:trialpasta                     0.40222   
## age.group4:trialpasta                     0.86401   
## age.group3:conditionNo Label              0.00600 **
## age.group4:conditionNo Label              0.04879 * 
## trialfaces:conditionNo Label              0.66257   
## trialhouses:conditionNo Label             0.79510   
## trialpasta:conditionNo Label              0.95685   
## age.group3:trialfaces:conditionNo Label   0.03792 * 
## age.group4:trialfaces:conditionNo Label   0.79619   
## age.group3:trialhouses:conditionNo Label  0.00596 **
## age.group4:trialhouses:conditionNo Label  0.99027   
## age.group3:trialpasta:conditionNo Label   0.13897   
## age.group4:trialpasta:conditionNo Label   0.78296   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 24 > 20.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it
plot(ranef(m2.lmer))
## $subid

Get an estimate of the significance value for the coefficient on the age*condition interaction by using anova to compare between your semi-maximal model and the model without an intercept.

m3.lmer <- glmer(correct ~ age.group * trial * condition -1 + (1|subid), 
                 family="binomial", nAGQ = 0, data=d)

anova(m2.lmer, m1.lmer)
## Data: d
## Models:
## m2.lmer: correct ~ age.group * trial * condition + (1 | subid)
## m1.lmer: correct ~ age.group * trial * condition + (trial | subid)
##         Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## m2.lmer 25 664.03 773.45 -307.01   614.03                         
## m1.lmer 34 679.06 827.86 -305.53   611.06 2.9714      9     0.9654