This is problem set #3, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills and some linear modeling.

For ease of reading, please separate your answers from our text by marking our text with the > character (indicating quotes).

Part 1: Basic simulation and NHST

Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is rnorm). What’s the mean number of “significant” results?

First do this using a for loop.

sig = 0
for (i in 1:10000) {
  n = rnorm(30)
  if(t.test(n)$p.value < .05){
    sig = sig +1
  }
}
sig
## [1] 445
proportion = sig/10000; proportion
## [1] 0.0445

There were 445 significant results, which makes sense since there should be a 5% false positive rate, and that is 4.45% false positive.

Next, do this using the replicate function:

table(replicate(1000, t.test(rnorm(30))$p.value < .05))
## 
## FALSE  TRUE 
##   958    42

Again, we see that around 5% of the tests are false positives.

Ok, that was a bit boring. Let’s try something moderately more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).

Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether their performance is above chance. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.

First, write a function that implements this sampling regime.

double.sample <- function () {
  n = rnorm(30)
  if(t.test(n)$p.value < .05){
    t.test(n)$p.value
  }
  else if (t.test(n)$p.value < .25) {
    n= n +rnorm(30)
    t.test(n)$p.value
  }
  else {
    t.test(n)$p.value
  }
}

Now call this function 10k times and find out what happens.

array1 = NULL
for (i in 1:10000) {
  array1[i] = double.sample()
}

length(array1[array1<.05])
## [1] 705

Is there an inflation of false positives? How bad is it?

It looks like the inflation makes the false positive rate to rise to just around .07.

Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. Let’s see what happens when you double the sample ANY time p > .05 (not just when p < .25), or when you do it only if p < .5 or < .75. How do these choices affect the false positive rate?

HINT: Try to do this by making the function double.sample take the upper p value as an argument, so that you can pass this through dplyr.

HINT 2: You may need more samples. Find out by looking at how the results change from run to run.

Below is the function

double.sample <- function (x) {
  n = rnorm(30)
  if(t.test(n)$p.value < .05){
    t.test(n)$p.value
  }
  else if (t.test(n)$p.value < x) {
    n= n +rnorm(30)
    t.test(n)$p.value
  }
  else {
    t.test(n)$p.value
  }
}

Here is the false positive rate when repeating when p is less than .50

array1 = NULL
for (i in 1:10000) {
  array1[i] = double.sample(.50)
}

length(array1[array1<.05])
## [1] 763

The false positive rate is around 7.8 to 8%.

Here is the false positive rate when repeating when p is less than .75

array1 = NULL
for (i in 1:10000) {
  array1[i] = double.sample(.75)
}
length(array1[array1<.05])
## [1] 808

Around 8% to 8.3%.

Here is the false positive rate when doubling when p is not < .05.

array1 = NULL
for (i in 1:10000) {
  array1[i] = double.sample(1)
}
length(array1[array1<.05])
## [1] 872

Around 8 to 8.6 false positive rates.

What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?

It is surprising and sad how a simple and common research heursitic like, “let’s douple the sample size” can actually increase the false positive rates to up to ~9%! This has opened my eyes to a problem.

Part 2: The Linear Model

2A: Basic Linear Modeling

Let’s use the ToothGrowth dataset, on guineapig teeth based on orange juice and vitamin C. This is super simple. (Try ?ToothGrowth).

First plot the data, we’ll use qplot to understand how len (tooth length) depends on dose (amount of Vitamin C) and supp (delivery method).

ToothGrowth$doseF = as.factor(ToothGrowth$dose)
qplot(doseF, len, 
      facets = ~supp,
      data=ToothGrowth)

tg.mean = ToothGrowth %>% 
          group_by(doseF, supp) %>%
          summarise(len = mean(len))

ggplot(tg.mean, aes(x = doseF, y=len, group=supp, fill=supp))+
      geom_bar(position = position_dodge(), stat = "identity")

So now you see what’s going on.

Next, always make a histogram of the DV before making a linear model! This reveals the distribution and can be helpful in choosing your model type.

qplot(ToothGrowth$len)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(len, data=ToothGrowth)+
      facet_grid(~supp)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Now make a linear model of tooth lengths using lm. Try making one with main effects and interactions and another with just main effects. Make sure to assign them to variables so that you can get them later.

mainef = lm(len ~ dose + supp, data=ToothGrowth); summary(mainef)
## 
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.600 -3.700  0.373  2.116  8.800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.2725     1.2824   7.231 1.31e-09 ***
## dose          9.7636     0.8768  11.135 6.31e-16 ***
## suppVC       -3.7000     1.0936  -3.383   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared:  0.7038, Adjusted R-squared:  0.6934 
## F-statistic: 67.72 on 2 and 57 DF,  p-value: 8.716e-16
inter = lm(len ~ dose * supp, data=ToothGrowth); summary(inter)
## 
## Call:
## lm(formula = len ~ dose * supp, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8462  0.0504  2.2893  7.9386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.550      1.581   7.304 1.09e-09 ***
## dose           7.811      1.195   6.534 2.03e-08 ***
## suppVC        -8.255      2.236  -3.691 0.000507 ***
## dose:suppVC    3.904      1.691   2.309 0.024631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.7296, Adjusted R-squared:  0.7151 
## F-statistic: 50.36 on 3 and 56 DF,  p-value: 6.521e-16
contrasts(ToothGrowth$supp) = c(1,0)
contrasts(ToothGrowth$supp) = c(0,1)

Now try taking out the intercept, using a -1 term in the formula. what does this do?

Removing the intercept forces it to be at the origin and gives us an extra degree of freedom to measure the effect of each supplement (not relative to each other). It tells us the incrememental effect of each supplement and dose.

summary(lm(len ~ dose + supp -1, data=ToothGrowth))
## 
## Call:
## lm(formula = len ~ dose + supp - 1, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.600 -3.700  0.373  2.116  8.800 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)    
## dose     9.7636     0.8768  11.135 6.31e-16 ***
## suppOJ   9.2725     1.2824   7.231 1.31e-09 ***
## suppVC   5.5725     1.2824   4.345 5.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared:  0.9586, Adjusted R-squared:  0.9564 
## F-statistic: 439.7 on 3 and 57 DF,  p-value: < 2.2e-16
summary(lm(len ~ dose * supp -1, data=ToothGrowth))
## 
## Call:
## lm(formula = len ~ dose * supp - 1, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8463  0.0504  2.2893  7.9386 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## dose          7.811      1.195   6.534 2.03e-08 ***
## suppOJ       11.550      1.581   7.304 1.09e-09 ***
## suppVC        3.295      1.581   2.084   0.0418 *  
## dose:supp1    3.904      1.691   2.309   0.0246 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.9622, Adjusted R-squared:  0.9595 
## F-statistic: 356.2 on 4 and 56 DF,  p-value: < 2.2e-16

Thought question: Take a moment to interpret the coefficients of the model. Q1 - What are the units?

For dose: The units are increase in length of tooth per 1 milligram increase of dose. When there is an intercept and interaction, this is interpreted at the supplement that is dummy coded 0.

For supplement: the units are increase in length of tooth compared to the dummy coded 0 supplement at dose = 0. However, when there is no intercept, we look at the effects assuming an intercept of 0. for example, when we are not controlling for dose, each beta represents the mean tooth length for each supplement.

Q2 - How does the interaction relate to the plot?

the interaction suggests that the effect of dose on tooth length is depedent on supplement.

Q3 - Should there be an interaction in the model? What does it mean? How important is it?

Looking at the graph, yes there should be an interaction. It looks like the effect of dose matters more for the VC supplement. And the model confirms that the interaction is useful, since it is signifcant. Looking at simple effects, the effect of dose is indeed stronger (b = 11.72) for VC compared to OJ (b = 7.81).

Now make predictions from the model you like the best. What should happen with doses of 0, 1.5, 2.5, and 10 under both supplements?

HINT: use the predict function …

HINT 2: you will have to make a dataframe to do the prediction with, so use something like data.frame(dose=...).

mod1 = lm(len ~ dose*supp, data=ToothGrowth)

m1 = data.frame(dose = cbind(c(0,1.5,2.5,10,0,1.5,2.5,10)), supp=c(rep("OJ",4), rep("VC",4)))

m1$pred = predict(mod1, m1)

Now plot the residuals from the original model. How do they look? HINT: ?resid

qplot(dose, resid(mod1), 
      facets = ~supp,
      data=ToothGrowth)

It looks like the residuals are not even based on dosage. Specifically, for both supplements, the model underestimates dosage 1. this makes me think that mayne the main effect of dose is curvilinear. Let’s check.

summary(lm(len ~ poly(dose,2) * supp, data=ToothGrowth))
## 
## Call:
## lm(formula = len ~ poly(dose, 2) * supp, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8.20  -2.72  -0.27   2.65   8.27 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           20.6633     0.6630  31.166  < 2e-16 ***
## poly(dose, 2)1        37.7328     5.1356   7.347 1.12e-09 ***
## poly(dose, 2)2       -18.6217     5.1356  -3.626 0.000638 ***
## supp1                 -3.7000     0.9376  -3.946 0.000231 ***
## poly(dose, 2)1:supp1  18.8595     7.2628   2.597 0.012101 *  
## poly(dose, 2)2:supp1   8.8088     7.2628   1.213 0.230460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared:  0.7937, Adjusted R-squared:  0.7746 
## F-statistic: 41.56 on 5 and 54 DF,  p-value: < 2.2e-16

Yep!

2B: Exploratory Linear Modeling

What the heck is going on? Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (Faces_Medium, in which kids played on a white background, and Faces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). Forgive our bad naming conventions.

Try to figure out what the most reasonable linear model of the data is.

d <- read.csv("http://langcog.stanford.edu/FVS2011-hands.csv")

Plot that model on the same plot as the data.

HINT: you can do this either using predict or (if you are feeling confident of your understanding of the models) using the built-in linear models in ggplot’s geom_smooth.

contrasts(d$condition)
##              Faces_Plus
## Faces_Medium          0
## Faces_Plus            1
summary(lm(hand.look ~ age*condition, data=d))
## 
## Call:
## lm(formula = hand.look ~ age * condition, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.190121 -0.036508 -0.001239  0.032091  0.239474 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)   
## (Intercept)              0.022394   0.015417   1.453  0.14772   
## age                      0.003143   0.001176   2.673  0.00807 **
## conditionFaces_Plus     -0.028440   0.021680  -1.312  0.19089   
## age:conditionFaces_Plus  0.003950   0.001653   2.389  0.01770 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05456 on 228 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1764 
## F-statistic: 17.49 on 3 and 228 DF,  p-value: 2.955e-10
summary(lm(hand.look ~ poly(age,3)*condition, data=d))
## 
## Call:
## lm(formula = hand.look ~ poly(age, 3) * condition, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.169897 -0.041130 -0.001441  0.030575  0.222312 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        0.061331   0.004793  12.795  < 2e-16
## poly(age, 3)1                      0.208384   0.073444   2.837 0.004967
## poly(age, 3)2                      0.080984   0.073214   1.106 0.269858
## poly(age, 3)3                     -0.040238   0.072934  -0.552 0.581700
## conditionFaces_Plus                0.020477   0.006779   3.021 0.002814
## poly(age, 3)1:conditionFaces_Plus  0.259394   0.103260   2.512 0.012708
## poly(age, 3)2:conditionFaces_Plus -0.141970   0.103258  -1.375 0.170535
## poly(age, 3)3:conditionFaces_Plus -0.349895   0.103248  -3.389 0.000829
##                                      
## (Intercept)                       ***
## poly(age, 3)1                     ** 
## poly(age, 3)2                        
## poly(age, 3)3                        
## conditionFaces_Plus               ** 
## poly(age, 3)1:conditionFaces_Plus *  
## poly(age, 3)2:conditionFaces_Plus    
## poly(age, 3)3:conditionFaces_Plus ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05162 on 224 degrees of freedom
## Multiple R-squared:  0.2851, Adjusted R-squared:  0.2627 
## F-statistic: 12.76 on 7 and 224 DF,  p-value: 9.09e-14
ggplot(d, aes(x= age, y = hand.look, group=condition, color=condition)) +
      geom_point() +
      geom_smooth(method="lm", formula = y ~ x)

What do you conclude from this pattern of data?

It looks like there is an interaction of condition and age on looking time for hands. Specifically, it looks like children are always more likely to look at hands as they get older, but this effect is more pronounced for the condition labeled “faces_plus.”

3: Linear Mixed Effect Models

The goal here is to learn to use LMEMs using lme4 and to compare them to standard by subject, by item LMs, as well as the standard (no repeated measures) fixed effects GLM.

The dataset here is from Stiller, Goodman, & Frank (2014), a paper on children’s pragmatic inferences. We saw the paradigm in the counterbalancing lecture: it’s three faces: a smiley, a smiley with glasses, and a smiley with a hat and glasses. When told “my friend has glasses” do kids pick the one with the glasses and no hat? age.group is the kids’ age group, condition is either “label,” described above, or “no label,” which was a control condition in which kids picked without hearing the term “glasses” at all.

d <- read.csv("http://langcog.stanford.edu/scales_data.csv")
d$age.group <- factor(d$age.group)

Always begin with a histogram!

qplot(correct, data=d)+
      facet_grid(~age.group)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(correct, data=d)+
      facet_grid(~condition)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Brief Confidence Interval Digression

Start out by setting up a function for a 95% CI using the normal approximation.

ci95.norm <- function(x) {
  sd(x, na.rm=TRUE) / sqrt(length(x)) * qnorm(.975)
} 

But the number of participants in a group is likely to be < 30, so let’s also compute this with a t distribution.

ci95.t <- function(x) {
  sd(x, na.rm=TRUE) / sqrt(length(x)) * qt(.975,length(x)-1)
} 

On the other hand, maybe we should use bootstrap CIs because these are actually proportions, and the normal/t approximations don’t know that they are 0/1 bounded.

library(boot)
library(bootstrap)

We’re going to use it to check some code that I use:

theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}
ci.low <- function(x,na.rm=T) {
  mean(x,na.rm=na.rm) - 
    quantile(bootstrap(1:length(x),
                       10000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)}
ci.high <- function(x,na.rm=T) {
  quantile(bootstrap(1:length(x),
                     10000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) - 
    mean(x,na.rm=na.rm)}

Now make 95% confidence intervals across participants using all the methods above:

  • Normal
  • t
  • Bootstrap percentile using my code
d.prop = d %>% 
  group_by(subid, condition, age.group) %>%
  summarise(prop = mean(correct))

ci95.norm(d.prop$prop)
## [1] 0.05377146
ci95.t(d.prop$prop)
## [1] 0.05422089
ci.low(d.prop$prop)
##       2.5% 
## 0.05272109
ci.high(d.prop$prop)
##      97.5% 
## 0.05442177

Now plot the data by age and condition using dplyr and ggplot2. Plot some CIs on here - extra credit if you plot all of them and compare visually (you’ll need position = position_dodge() or some other way to offset them).

d.mean= d.prop %>%
  group_by(age.group, condition) %>%
  summarise(mean = mean(prop), ci.t = ci95.t(prop), ci.n = ci95.norm(prop), ci.l= ci.low(prop), ci.h =ci.high(prop))

CI with t distribution

ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
      geom_bar(position = position_dodge(), stat = "identity") +  
      geom_errorbar(aes(ymin=mean-ci.t, ymax=mean+ci.t),
                  width=.2,   
                  position=position_dodge(.9))

CI with normal distribution

ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
      geom_bar(position = position_dodge(), stat = "identity") +  
      geom_errorbar(aes(ymin=mean-ci.n, ymax=mean+ci.n),
                  width=.2,   
                  position=position_dodge(.9))

CI with bootstrapping

ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
      geom_bar(position = position_dodge(), stat = "identity") +  
      geom_errorbar(aes(ymin=mean-ci.l, ymax=mean+ci.h),
                  width=.2,   
                  position=position_dodge(.9))

What do you conclude about confidence interval computation?

I conclude that it doesn’t really matter :)

Back to LMEMs

OK, now do a basic GLM over the entire data frame, including age.group and condition and their interaction. (If we were focusing on developmental issues, I would ask you to think about how to model age here, but let’s treat it as three discrete groups for now).

NOTE: this model is not appropriate, because it assumes that each subject’s observations are independent from one another. It’s still fine to do the analysis, though: it can tell you a lot about the data and is easy and fast to fit, as long as you know that you can’t trust the p-values!

summary(glm(correct ~ condition * age.group, family = "binomial", data=d ))
## 
## Call:
## glm(formula = correct ~ condition * age.group, family = "binomial", 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7166  -0.7401  -0.5168   0.8250   2.0393  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2819     0.2020   1.395 0.162902    
## conditionNo Label             -1.4368     0.3130  -4.590 4.43e-06 ***
## age.group3                     0.6210     0.2961   2.098 0.035935 *  
## age.group4                     0.9312     0.3159   2.948 0.003198 ** 
## conditionNo Label:age.group3  -0.6210     0.4494  -1.382 0.167050    
## conditionNo Label:age.group4  -1.7221     0.5022  -3.429 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 808.59  on 587  degrees of freedom
## Residual deviance: 648.74  on 582  degrees of freedom
## AIC: 660.74
## 
## Number of Fisher Scoring iterations: 4

Let’s now use dplyr to get data frames for by-items and by-subjects analyses. These should contain means for age and condition across items.

d.prop2 = d %>% 
  group_by(trial, condition, age.group) %>%
  summarise(prop = mean(correct))

Now do standard linear models on each of these.

NOTE: These are not strictly correct either because of the normal approximation on percent correct (model doesn’t know it’s 0 - 1 bounded and could give you standard error that goes above 1). Again, useful to do and see what happens.

summary(lm(prop ~ condition*age.group, data=d.prop2))
## 
## Call:
## lm(formula = prop ~ condition * age.group, data = d.prop2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21154 -0.07719  0.01042  0.07333  0.21154 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.57000    0.05485  10.392 4.93e-09 ***
## conditionNo Label            -0.33042    0.07757  -4.260 0.000471 ***
## age.group3                    0.14154    0.07757   1.825 0.084691 .  
## age.group4                    0.20083    0.07757   2.589 0.018518 *  
## conditionNo Label:age.group3 -0.14154    0.10970  -1.290 0.213292    
## conditionNo Label:age.group4 -0.31542    0.10970  -2.875 0.010068 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1097 on 18 degrees of freedom
## Multiple R-squared:  0.8752, Adjusted R-squared:  0.8405 
## F-statistic: 25.24 on 5 and 18 DF,  p-value: 1.52e-07
summary(lm(prop ~ condition*age.group, data=d.prop))
## 
## Call:
## lm(formula = prop ~ condition * age.group, data = d.prop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57000 -0.21154  0.01042  0.18000  0.51042 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.57000    0.04420  12.897  < 2e-16 ***
## conditionNo Label            -0.33042    0.06315  -5.232 5.95e-07 ***
## age.group3                    0.14154    0.06190   2.287 0.023714 *  
## age.group4                    0.20083    0.06315   3.180 0.001810 ** 
## conditionNo Label:age.group3 -0.14154    0.08889  -1.592 0.113560    
## conditionNo Label:age.group4 -0.31542    0.08977  -3.514 0.000594 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.221 on 141 degrees of freedom
## Multiple R-squared:  0.5737, Adjusted R-squared:  0.5586 
## F-statistic: 37.96 on 5 and 141 DF,  p-value: < 2.2e-16

Do ANOVA on these. Note that ANOVA doesn’t let you figure out what is going on with respect to age.

summary(aov(prop ~ condition*age.group, data=d.prop2))
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## condition            1 1.3982  1.3982 116.190 2.78e-09 ***
## age.group            2 0.0204  0.0102   0.846    0.446    
## condition:age.group  2 0.0998  0.0499   4.148    0.033 *  
## Residuals           18 0.2166  0.0120                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On to linear mixed effect models. Create the maximal random effects model a la Barr et al. (2013). Does it converge? If not, what will you do to make it converge? (The internet can be your friend here).

HINT: make sure that you consider which random effects are appropriate. Consider which observations are within/between subjects. E.g. having a random coefficient for age by subject doesn’t make sense, because each subject has only one age.

summary(glmer(correct ~ condition*age.group + (condition*age.group|trial) + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## correct ~ condition * age.group + (condition * age.group | trial) +  
##     (1 | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    688.2    810.8   -316.1    632.2      560 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6159 -0.5593 -0.2649  0.6724  3.5931 
## 
## Random effects:
##  Groups Name                         Variance Std.Dev. Corr             
##  subid  (Intercept)                  0.10293  0.3208                    
##  trial  (Intercept)                  0.01422  0.1193                    
##         conditionNo Label            0.00984  0.0992    1.00            
##         age.group3                   0.49426  0.7030    1.00  1.00      
##         age.group4                   0.11935  0.3455    1.00  1.00  1.00
##         conditionNo Label:age.group3 1.48553  1.2188   -1.00 -1.00 -1.00
##         conditionNo Label:age.group4 0.08097  0.2846    1.00  1.00  1.00
##             
##             
##             
##             
##             
##             
##  -1.00      
##   1.00 -1.00
## Number of obs: 588, groups:  subid, 147; trial, 4
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2897     0.2232   1.298 0.194266    
## conditionNo Label             -1.4827     0.3410  -4.349 1.37e-05 ***
## age.group3                     0.7672     0.4854   1.581 0.113983    
## age.group4                     1.0085     0.3845   2.623 0.008717 ** 
## conditionNo Label:age.group3  -0.7789     0.7826  -0.995 0.319592    
## conditionNo Label:age.group4  -2.0467     0.6043  -3.387 0.000706 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.577                            
## age.group3  -0.228  0.372                     
## age.group4  -0.411  0.397  0.581              
## cndtnNLb:.3  0.053 -0.517 -0.860 -0.509       
## cndtnNLb:.4  0.398 -0.478  0.013 -0.405  0.049

This doesn’t converge. I am going to limit my model to random effects that make sense to me. Specifically, I don’t see any reason to expect the random effect of trial to vary based on condition or age or its interaction, so I am going to reduce this model to two random intercepts.

summary(glmer(correct ~ condition*age.group + (1|trial) + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ condition * age.group + (1 | trial) + (1 | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    661.2    696.2   -322.6    645.2      580 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0905 -0.6050 -0.3480  0.6322  2.7987 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subid  (Intercept) 0.05388  0.2321  
##  trial  (Intercept) 0.07494  0.2737  
## Number of obs: 588, groups:  subid, 147; trial, 4
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2905     0.2513   1.156 0.247698    
## conditionNo Label             -1.4786     0.3300  -4.481 7.43e-06 ***
## age.group3                     0.6393     0.3085   2.072 0.038261 *  
## age.group4                     0.9571     0.3292   2.907 0.003646 ** 
## conditionNo Label:age.group3  -0.6396     0.4655  -1.374 0.169441    
## conditionNo Label:age.group4  -1.7622     0.5195  -3.392 0.000693 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.543                            
## age.group3  -0.566  0.417                     
## age.group4  -0.528  0.384  0.446              
## cndtnNLb:.3  0.375 -0.671 -0.663 -0.296       
## cndtnNLb:.4  0.333 -0.590 -0.287 -0.639  0.441

How do these coefficients compare with the independent coefficients linear model? What do you conclude?

It looks like these models are around the same, so there is an interaction of condition and age group and there seems to be somewhat little variance due to subject or trial (surprisingly!)

Which random effects make the most difference? Find out using ranef. Plot the random effects for subject and item.

mod2 = glmer(correct ~ condition*age.group + (1|trial), family = "binomial", data=d)
mod3 = glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d)
plot(ranef(mod2))
## $trial

plot(ranef(mod3))
## $subid

It looks like the trial random effect is stronger. I concluded this by looking at the two intercept model and seeing that the trial intercept explained more variance. Plus, the random effects range is larger in the plots for trial (.2 to -.3) than for subject id (-.06 to .04).

Make the minimal random effects model with just a subject intecept. How does this compare?

summary(glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ condition * age.group + (1 | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    662.7    693.4   -324.4    648.7      581 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8385 -0.5597 -0.3737  0.6337  2.6414 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subid  (Intercept) 0.02648  0.1627  
## Number of obs: 588, groups:  subid, 147
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2837     0.2056   1.380 0.167593    
## conditionNo Label             -1.4456     0.3223  -4.485  7.3e-06 ***
## age.group3                     0.6248     0.3014   2.073 0.038187 *  
## age.group4                     0.9365     0.3220   2.908 0.003633 ** 
## conditionNo Label:age.group3  -0.6249     0.4561  -1.370 0.170658    
## conditionNo Label:age.group4  -1.7305     0.5104  -3.390 0.000698 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.646                            
## age.group3  -0.675  0.419                     
## age.group4  -0.630  0.387  0.443              
## cndtnNLb:.3  0.446 -0.675 -0.661 -0.293       
## cndtnNLb:.4  0.395 -0.593 -0.282 -0.635  0.438

All of these look surprisingly alike.

Get significance values for the coefficient on the age*condition interaction using the anova, likelihood-ratio test approach, comparing between your semi-maximal model and one without an intercept.

#semimaximal model 
mod1 = glmer(correct ~ condition*age.group + (1|subid) +  (1|trial), family = "binomial", data=d)
mod2 = glmer(correct ~ condition*age.group +  (1|trial), family = "binomial", data=d)
mod3 = glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d)

anova(mod1, mod2) # mod1 is not better than mod2
## Data: d
## Models:
## mod2: correct ~ condition * age.group + (1 | trial)
## mod1: correct ~ condition * age.group + (1 | subid) + (1 | trial)
##      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## mod2  7 659.33 689.97 -322.67   645.33                         
## mod1  8 661.24 696.25 -322.62   645.24 0.0958      1      0.757
anova(mod1, mod3)  #mod 2 is better than mod3
## Data: d
## Models:
## mod3: correct ~ condition * age.group + (1 | subid)
## mod1: correct ~ condition * age.group + (1 | subid) + (1 | trial)
##      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)  
## mod3  7 662.72 693.35 -324.36   648.72                           
## mod1  8 661.24 696.25 -322.62   645.24 3.4802      1    0.06211 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looks like including random intercept of subject and trial is not necessary, but just trial does improve the model. thus, this leads me to the conclusion that the random effect of trial is the only one we should include in the model. That being said, the effects always look pretty similar, and that is that participants are correct more often in the label condition, and that difference is more significant among the age group of four relative to the age group of 2. Let me look at the linear effect using contrast coding:

contrasts(d$age.group) = cbind(c(-1,0,1), c(0,2,0))
summary(glmer(correct ~ condition*age.group +  (1|trial), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ condition * age.group + (1 | trial)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    659.3    690.0   -322.7    645.3      581 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0761 -0.6082 -0.3562  0.6320  2.8071 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  trial  (Intercept) 0.0728   0.2698  
## Number of obs: 588, groups:  trial, 4
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   0.75958    0.20885   3.637 0.000276 ***
## conditionNo Label            -2.33294    0.25413  -9.180  < 2e-16 ***
## age.group1                    0.47297    0.15924   2.970 0.002976 ** 
## age.group2                    0.07922    0.13499   0.587 0.557305    
## conditionNo Label:age.group1 -0.87246    0.25272  -3.452 0.000556 ***
## conditionNo Label:age.group2  0.12052    0.20570   0.586 0.557943    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtNL ag.gr1 ag.gr2 cNL:.1
## conditnNLbl -0.482                            
## age.group1   0.139 -0.117                     
## age.group2  -0.449  0.368 -0.106              
## cndtnNLb:.1 -0.088  0.223 -0.631  0.066       
## cndtnNLb:.2  0.295 -0.610  0.070 -0.656 -0.135

Yes indeed, there is a significant difference in the linear effect. Looking at the graph, this can again be interpreted as kids get more correct as they get older, but only for the label condition, meaning they are learning to make inferences!