This is problem set #3, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills and some linear modeling.

For ease of reading, please separate your answers from our text by marking our text with the > character (indicating quotes).

options(warn=-1)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Part 1: Basic simulation and NHST

Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is rnorm). What’s the mean number of “significant” results?

First do this using a for loop.

numSig = 0
for (i in 1:10000) {
  pval = t.test(rnorm(30))$p.value
  if (pval < .05) {
    numSig = numSig + 1
  }
}
print(numSig)
## [1] 486

Mean number of significant results = 486, i.e. false positive rate is 0.0486, which is close to what we expect.

Next, do this using the replicate function:

pvals = replicate(10000, t.test(rnorm(30))$p.value)
numSig = sum(pvals < .05)

print(numSig)
## [1] 496

Mean number of significant results = 496, i.e. false positive rate is 0.0496, which is once again close to what we expect.

Ok, that was a bit boring. Let’s try something moderately more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).

Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether their performance is above chance. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.

First, write a function that implements this sampling regime.

double.sample <- function () {
  data = rnorm(30)
  pval = t.test(data)$p.value
  if (pval > .05 & pval < .25) {
    data = c(data, rnorm(30))
    pval = t.test(data)$p.value
  }
  return(pval)
}

Now call this function 10k times and find out what happens.

pvals = replicate(10000, double.sample())
numSig = sum(pvals < .05)

print(numSig)
## [1] 720

Is there an inflation of false positives? How bad is it?

Mean number of significant results = 720, i.e. false positive rate is 0.072, which is more than what it should be. There is a noticeable inflation of the false positive rate. Comparing with .05 as our standard, the false positive rate inflated by a factor of 1.44.

Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. Let’s see what happens when you double the sample ANY time p > .05 (not just when p < .25), or when you do it only if p < .5 or < .75. How do these choices affect the false positive rate?

HINT: Try to do this by making the function double.sample take the upper p value as an argument, so that you can pass this through dplyr.

HINT 2: You may need more samples. Find out by looking at how the results change from run to run.

double.sample <- function (upper_p) {
  data = rnorm(30)
  pval = t.test(data)$p.value
  if (pval > .05 & pval < upper_p) {
    data = c(data, rnorm(30))
    pval = t.test(data)$p.value
  }
  return(pval)
}
# Using a larger number of samples to get more accurate results

nsamples = 20000

# Trying upper p-value of .75

pvals = replicate(nsamples, double.sample(0.75))
numSig1 = sum(pvals < .05)
print(numSig1)
## [1] 1686
# Trying upper p-value of 1

pvals = replicate(nsamples, double.sample(1))
numSig2 = sum(pvals < .05)
print(numSig2)
## [1] 1698

What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?

For doubling when .05 < p < .75, the false positive rate is 0.0843. For doubling when .05 < p < 1, the false positive rate is 0.0849. Once again there is a noticeable inflation of the p-value, so we can conclude that this data-dependent policy is not good.

Part 2: The Linear Model

2A: Basic Linear Modeling

Let’s use the ToothGrowth dataset, on guineapig teeth based on orange juice and vitamin C. This is super simple. (Try ?ToothGrowth).

First plot the data, we’ll use qplot to understand how len (tooth length) depends on dose (amount of Vitamin C) and supp (delivery method).

qplot(dose, y = len, color = supp, data = ToothGrowth)

So now you see what’s going on.

Next, always make a histogram of the DV before making a linear model! This reveals the distribution and can be helpful in choosing your model type.

# Possible histograms for the ToothGrowth data

hists = function() {
  h1 = qplot(len, fill = supp, geom = 'histogram', data = ToothGrowth)
  h2 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth)
  h3 = qplot(len, fill = supp, geom = 'histogram', data = ToothGrowth) + facet_grid(dose ~ .)
  h4 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth) + facet_grid(supp ~ .)
  h5 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth) + facet_grid(supp ~ dose)
  
  grid.arrange(h1, h2, ncol = 1)
  print(h3)
  print(h4)
  print(h5)
}
suppressMessages(hists())

Now make a linear model of tooth lengths using lm. Try making one with main effects and interactions and another with just main effects. Make sure to assign them to variables so that you can get them later.

# Main effects only (additive model)
lm1 = lm(len ~ dose + supp, data = ToothGrowth)
summary(lm1)
## 
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.600 -3.700  0.373  2.116  8.800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.2725     1.2824   7.231 1.31e-09 ***
## dose          9.7636     0.8768  11.135 6.31e-16 ***
## suppVC       -3.7000     1.0936  -3.383   0.0013 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared:  0.7038, Adjusted R-squared:  0.6934 
## F-statistic: 67.72 on 2 and 57 DF,  p-value: 8.716e-16
# Main effects and interaction effects (multiplicative model)
lm2 = lm(len ~ dose * supp, data = ToothGrowth)
summary(lm2)
## 
## Call:
## lm(formula = len ~ dose * supp, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8462  0.0504  2.2893  7.9386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.550      1.581   7.304 1.09e-09 ***
## dose           7.811      1.195   6.534 2.03e-08 ***
## suppVC        -8.255      2.236  -3.691 0.000507 ***
## dose:suppVC    3.904      1.691   2.309 0.024631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.7296, Adjusted R-squared:  0.7151 
## F-statistic: 50.36 on 3 and 56 DF,  p-value: 6.521e-16

Now try taking out the intercept, using a -1 term in the formula. what does this do?

lm3 = lm(len ~ dose + supp - 1, data = ToothGrowth)
summary(lm3)
## 
## Call:
## lm(formula = len ~ dose + supp - 1, data = ToothGrowth)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.600 -3.700  0.373  2.116  8.800 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)    
## dose     9.7636     0.8768  11.135 6.31e-16 ***
## suppOJ   9.2725     1.2824   7.231 1.31e-09 ***
## suppVC   5.5725     1.2824   4.345 5.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared:  0.9586, Adjusted R-squared:  0.9564 
## F-statistic: 439.7 on 3 and 57 DF,  p-value: < 2.2e-16
lm4 = lm(len ~ dose * supp - 1, data = ToothGrowth)
summary(lm4)
## 
## Call:
## lm(formula = len ~ dose * supp - 1, data = ToothGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2264 -2.8463  0.0504  2.2893  7.9386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## dose           7.811      1.195   6.534 2.03e-08 ***
## suppOJ        11.550      1.581   7.304 1.09e-09 ***
## suppVC         3.295      1.581   2.084   0.0418 *  
## dose:suppVC    3.904      1.691   2.309   0.0246 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared:  0.9622, Adjusted R-squared:  0.9595 
## F-statistic: 356.2 on 4 and 56 DF,  p-value: < 2.2e-16

When we remove the intercept, we have both OJ and VC appear as variables in our linear model, instead of just one of them. This is because the intercept serves as our baseline for comparison of one to the other; without the intercept we need to compensate for this by including both in the model.

Thought question: Take a moment to interpret the coefficients of the model. Q1 - What are the units?

Pretending that tooth length is measured in nanometers (wasn’t specified), then units of model = nm / (mg / day)

Q2 - How does the interaction relate to the plot?

The interaction explains why the points for OJ and VC lend themselves to best-fit lines that have different slopes, i.e. len increases with respect to daily dose at different rates for our two supps.

Q3 - Should there be an interaction in the model? What does it mean? How important is it?

Yes there should be an interaction in the model, and it is important because it is statistically significant (t = 2.309, p = .0246). This means that supp has a significant impact on the rate of increase of tooth length with respect to the daily dose quantity, and therefore should be taken into consideration in our model.

Now make predictions from the model you like the best. What should happen with doses of 0, 1.5, 2.5, and 10 under both supplements?

HINT: use the predict function …

HINT 2: you will have to make a dataframe to do the prediction with, so use something like data.frame(dose=...).

# An idea from http://stats.stackexchange.com/questions/6684/how-can-one-use-the-predict-function-on-a-lm-object-where-the-ivs-have-been-dyna

dosage = c(0, 1.5, 2.5, 10)
VC = expand.grid(dose = dosage, supp = 'VC')
OJ = expand.grid(dose = dosage, supp = 'OJ')
data.frame(dose = dosage, lenVC_prediction = predict(lm2, VC), lenOJ_prediction = predict(lm2, OJ))
##   dose lenVC_prediction lenOJ_prediction
## 1  0.0          3.29500         11.55000
## 2  1.5         20.86857         23.26714
## 3  2.5         32.58429         31.07857
## 4 10.0        120.45214         89.66429

Now plot the residuals from the original model. How do they look? HINT: ?resid

qplot(resid(lm2), fill = ToothGrowth$supp)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The residuals are for the most part centered around 0, but there is some slight skewing taking place (VC is slightly skewed to the right of 0, OJ is slightly skewed to the left). The distribution of the residuals isn’t perfect.

BONUS: test them for normality of distribution using a quantile-quantile plot.

HINT: ?qqplot and ?qqnorm

qqnorm(resid(lm2), xlab = 'Normal Scores', ylab = 'Residuals')
qqline(resid(lm2))

Distribution of residuals exhibit a decent fit with standard normal distribution of residuals.

2B: Exploratory Linear Modeling

What the heck is going on? Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (Faces_Medium, in which kids played on a white background, and Faces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). Forgive our bad naming conventions.

Try to figure out what the most reasonable linear model of the data is.

d <- read.csv("http://langcog.stanford.edu/FVS2011-hands.csv")

# Try both the additive and interactive model, and see which one fits better
lm1 = lm(hand.look ~ age + condition, data = d)
lm2 = lm(hand.look ~ age * condition, data = d)
summary(lm1)
## 
## Call:
## lm(formula = hand.look ~ age + condition, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.159908 -0.035120 -0.006159  0.028734  0.246199 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.0023491  0.0115374  -0.204  0.83884    
## age                  0.0051414  0.0008351   6.156  3.3e-09 ***
## conditionFaces_Plus  0.0204451  0.0072375   2.825  0.00515 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05512 on 229 degrees of freedom
## Multiple R-squared:  0.1667, Adjusted R-squared:  0.1595 
## F-statistic: 22.91 on 2 and 229 DF,  p-value: 8.488e-10
summary(lm2)
## 
## Call:
## lm(formula = hand.look ~ age * condition, data = d)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.190121 -0.036508 -0.001239  0.032091  0.239474 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)   
## (Intercept)              0.022394   0.015417   1.453  0.14772   
## age                      0.003143   0.001176   2.673  0.00807 **
## conditionFaces_Plus     -0.028440   0.021680  -1.312  0.19089   
## age:conditionFaces_Plus  0.003950   0.001653   2.389  0.01770 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05456 on 228 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1764 
## F-statistic: 17.49 on 3 and 228 DF,  p-value: 2.955e-10
anova(lm1, lm2)
## Analysis of Variance Table
## 
## Model 1: hand.look ~ age + condition
## Model 2: hand.look ~ age * condition
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)  
## 1    229 0.69573                             
## 2    228 0.67874  1  0.016992 5.7079 0.0177 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will use the interactive model because it fits the data better.

Plot that model on the same plot as the data.

HINT: you can do this either using predict or (if you are feeling confident of your understanding of the models) using the built-in linear models in ggplot’s geom_smooth.

ggplot(d, aes(x = age, y = hand.look, col = condition)) +
  geom_jitter() + 
  geom_smooth(method = 'lm')

What do you conclude from this pattern of data?

We can conclude that age has a linear positive impact on infant’s hand-looking, and the degree of this increase is impacted by which movie condition the infants are in. The FacesPlus condition results in a higher rate of increase of hand.look with respect to age than does the FacesMedium contition That is to say that the interaction between age and movie condition is significant.

3: Linear Mixed Effect Models

The goal here is to learn to use LMEMs using lme4 and to compare them to standard by subject, by item LMs, as well as the standard (no repeated measures) fixed effects GLM.

The dataset here is from Stiller, Goodman, & Frank (2014), a paper on children’s pragmatic inferences. We saw the paradigm in the counterbalancing lecture: it’s three faces: a smiley, a smiley with glasses, and a smiley with a hat and glasses. When told “my friend has glasses” do kids pick the one with the glasses and no hat? age.group is the kids’ age group, condition is either “label,” described above, or “no label,” which was a control condition in which kids picked without hearing the term “glasses” at all.

library(lme4)
## Loading required package: Matrix
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
d <- read.csv("http://langcog.stanford.edu/scales_data.csv")
d$age.group <- factor(d$age.group)

Always begin with a histogram!

ggplot(d, aes(x = correct, fill = trial)) +
  geom_histogram(binwidth = .5, position = 'dodge') +
  facet_grid(age.group ~ condition)

Brief Confidence Interval Digression

Start out by setting up a function for a 95% CI using the normal approximation.

sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}

ci95.norm <- function(x) {
  sem(x) * 1.96
} 

But the number of participants in a group is likely to be < 30, so let’s also compute this with a t distribution.

ci95.t <- function(x) {
  sem(x) * qt(.975, df = length(x))
} 

On the other hand, maybe we should use bootstrap CIs because these are actually proportions, and the normal/t approximations don’t know that they are 0/1 bounded.

library(boot)
library(bootstrap)

Take a look at ?boot and ?bootci. Note that the syntax for the boot library is terrible, so we’re going to use it to check some code that I use:

theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}
ci.low <- function(x,na.rm=T) {
  mean(x,na.rm=na.rm) - 
    quantile(bootstrap(1:length(x),
                       10000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)}
ci.high <- function(x,na.rm=T) {
  quantile(bootstrap(1:length(x),
                     10000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) - 
    mean(x,na.rm=na.rm)}

Now make 95% confidence intervals across participants using all the methods above:

  • Normal
  • t
  • Bootstrap percentile using boot.ci
  • Bootstrap percentile using my code
# boot.ci example from http://www.r-bloggers.com/bootstrap-confidence-intervals/

d.cis = d %>%
  group_by(age.group, condition) %>% 
  summarize(
    mean = mean(correct),
    ci95.norm = ci95.norm(correct),
    ci95.t = ci95.t(correct),
    bootLow = boot.ci(boot(correct, function(x, i) mean(x[i]), R = 999))$basic[4],
    bootHigh = boot.ci(boot(correct, function(x, i) mean(x[i]), R = 999))$basic[5],
    codeLow = ci.low(correct),
    codeHigh = ci.high(correct)
  )

print(d.cis)
## Source: local data frame [6 x 9]
## Groups: age.group [?]
## 
##   age.group condition      mean  ci95.norm     ci95.t    bootLow  bootHigh
##      (fctr)    (fctr)     (dbl)      (dbl)      (dbl)      (dbl)     (dbl)
## 1         2     Label 0.5700000 0.09752369 0.09871644 0.48000000 0.6700000
## 2         2  No Label 0.2395833 0.08583183 0.08692594 0.14583333 0.3229167
## 3         3     Label 0.7115385 0.08749447 0.08852287 0.62500000 0.8076923
## 4         3  No Label 0.2395833 0.08583183 0.08692594 0.15625000 0.3229167
## 5         4     Label 0.7708333 0.08451820 0.08559556 0.68750000 0.8645833
## 6         4  No Label 0.1250000 0.06650485 0.06735259 0.05208333 0.1875000
## Variables not shown: codeLow (dbl), codeHigh (dbl)

Now plot the data by age and condition using dplyr and ggplot2. Plot some CIs on here - extra credit if you plot all of them and compare visually (you’ll need position = position_dodge() or some other way to offset them).

ggplot(d.cis, aes(age.group, mean, fill = condition)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  geom_errorbar(aes(ymin = mean - ci95.norm, ymax = mean + ci95.norm), width = .25, position = position_dodge(.1)) +
  geom_errorbar(aes(ymin = mean - ci95.t, ymax = mean + ci95.t), width = .25, position = position_dodge(.5)) +
  geom_errorbar(aes(ymin = bootLow, ymax = bootHigh), width = .25, position = position_dodge(1.0)) +
  geom_errorbar(aes(ymin = mean - codeLow, ymax = mean + codeHigh), width = .25, position = position_dodge(1.5)) +
  labs(x = 'Age Group', y = 'Mean Correct') +
  scale_fill_discrete(name = 'Condition')

What do you conclude about confidence interval computation?

The four methods of computing confidence intervals used here seem to be within very close proximity to one another in terms of precision.

Back to LMEMs

OK, now do a basic GLM over the entire data frame, including age.group and condition and their interaction. (If we were focusing on developmental issues, I would ask you to think about how to model age here, but let’s treat it as three discrete groups for now).

NOTE: this model is not appropriate, because it assumes that each subject’s observations are independent from one another. It’s still fine to do the analysis, though: it can tell you a lot about the data and is easy and fast to fit, as long as you know that you can’t trust the p-values!

glm1 = glm(correct ~ age.group * condition, data = d, family = 'binomial')
summary(glm1)
## 
## Call:
## glm(formula = correct ~ age.group * condition, family = "binomial", 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7166  -0.7401  -0.5168   0.8250   2.0393  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2819     0.2020   1.395 0.162902    
## age.group3                     0.6210     0.2961   2.098 0.035935 *  
## age.group4                     0.9312     0.3159   2.948 0.003198 ** 
## conditionNo Label             -1.4368     0.3130  -4.590 4.43e-06 ***
## age.group3:conditionNo Label  -0.6210     0.4494  -1.382 0.167050    
## age.group4:conditionNo Label  -1.7221     0.5022  -3.429 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 808.59  on 587  degrees of freedom
## Residual deviance: 648.74  on 582  degrees of freedom
## AIC: 660.74
## 
## Number of Fisher Scoring iterations: 4

Let’s now use dplyr to get data frames for by-items and by-subjects analyses. These should contain means for age and condition across items.

df1 = d %>%
  group_by(subid, age.group, condition) %>% 
  summarize(
    mean = mean(correct)
  )
df2 = d %>%
  group_by(age.group, condition, trial) %>% 
  summarize(
    mean = mean(correct)
  )
print(df1)
## Source: local data frame [147 x 4]
## Groups: subid, age.group [?]
## 
##     subid age.group condition  mean
##    (fctr)    (fctr)    (fctr) (dbl)
## 1      C1         4     Label  1.00
## 2     C10         3     Label  0.50
## 3     C11         4     Label  0.75
## 4     C12         3     Label  0.75
## 5     C13         4     Label  0.50
## 6     C14         4     Label  0.75
## 7     C15         3     Label  1.00
## 8     C16         3     Label  0.50
## 9     C17         3     Label  0.50
## 10    C18         4     Label  0.75
## ..    ...       ...       ...   ...
print(df2)
## Source: local data frame [24 x 4]
## Groups: age.group, condition [?]
## 
##    age.group condition  trial      mean
##       (fctr)    (fctr) (fctr)     (dbl)
## 1          2     Label   beds 0.6000000
## 2          2     Label  faces 0.6400000
## 3          2     Label houses 0.4800000
## 4          2     Label  pasta 0.5600000
## 5          2  No Label   beds 0.2916667
## 6          2  No Label  faces 0.2500000
## 7          2  No Label houses 0.1666667
## 8          2  No Label  pasta 0.2500000
## 9          3     Label   beds 0.9230769
## 10         3     Label  faces 0.6153846
## ..       ...       ...    ...       ...

Now do standard linear models on each of these.

NOTE: These are not strictly correct either because of the normal approximation on percent correct (model doesn’t know it’s 0 - 1 bounded and could give you standard error that goes above 1). Again, useful to do and see what happens.

lm1 = lm(mean ~ age.group * condition + as.numeric(subid), data = df1)
summary(lm1)
## 
## Call:
## lm(formula = mean ~ age.group * condition + as.numeric(subid), 
##     data = df1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56567 -0.21567  0.00747  0.18519  0.51100 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.5914309  0.0856187   6.908 1.59e-10 ***
## age.group3                    0.1273166  0.0788618   1.614   0.1087    
## age.group4                    0.1847851  0.0837998   2.205   0.0291 *  
## conditionNo Label            -0.3377500  0.0681355  -4.957 2.04e-06 ***
## as.numeric(subid)            -0.0001922  0.0006570  -0.293   0.7703    
## age.group3:conditionNo Label -0.1243128  0.1068584  -1.163   0.2467    
## age.group4:conditionNo Label -0.2930486  0.1181286  -2.481   0.0143 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2217 on 140 degrees of freedom
## Multiple R-squared:  0.574,  Adjusted R-squared:  0.5557 
## F-statistic: 31.44 on 6 and 140 DF,  p-value: < 2.2e-16
lm2 = lm(mean ~ age.group * condition + trial, data = df2)
summary(lm2)
## 
## Call:
## lm(formula = mean ~ age.group * condition + trial, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17567 -0.03765 -0.00554  0.03459  0.17873 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.63109    0.05758  10.960 1.48e-08 ***
## age.group3                    0.14154    0.06649   2.129 0.050256 .  
## age.group4                    0.20083    0.06649   3.021 0.008605 ** 
## conditionNo Label            -0.33042    0.06649  -4.969 0.000168 ***
## trialfaces                   -0.08628    0.05429  -1.589 0.132837    
## trialhouses                  -0.14607    0.05429  -2.691 0.016772 *  
## trialpasta                   -0.01201    0.05429  -0.221 0.827920    
## age.group3:conditionNo Label -0.14154    0.09403  -1.505 0.153025    
## age.group4:conditionNo Label -0.31542    0.09403  -3.354 0.004346 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09403 on 15 degrees of freedom
## Multiple R-squared:  0.9236, Adjusted R-squared:  0.8828 
## F-statistic: 22.65 on 8 and 15 DF,  p-value: 4.833e-07

Do ANOVA on these. Note that ANOVA doesn’t let you figure out what is going on with respect to age.

# Using glm instead.

glm1 = glm(correct ~ age.group * condition + as.numeric(subid), data = d, family = 'binomial')
glm2 = glm(correct ~ age.group * condition + trial, data = d, family = 'binomial')
anova(glm1, glm2)
## Analysis of Deviance Table
## 
## Model 1: correct ~ age.group * condition + as.numeric(subid)
## Model 2: correct ~ age.group * condition + trial
##   Resid. Df Resid. Dev Df Deviance
## 1       581     648.66            
## 2       579     637.00  2   11.663

As we can see, model 2 (glm2) fits better.

On to linear mixed effect models. Create the maximal random effects model a la Barr et al. (2013). Does it converge? If not, what will you do to make it converge? (The internet can be your friend here).

HINT: make sure that you consider which random effects are appropriate. Consider which observations are within/between subjects. E.g. having a random coefficient for age by subject doesn’t make sense, because each subject has only one age.

mm1 = glmer(correct ~ age.group * condition + (1 | subid) + (1 | trial), data = d, family = 'binomial')
summary(mm1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ age.group * condition + (1 | subid) + (1 | trial)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    661.2    696.2   -322.6    645.2      580 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0905 -0.6050 -0.3480  0.6322  2.7987 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subid  (Intercept) 0.05389  0.2321  
##  trial  (Intercept) 0.07494  0.2737  
## Number of obs: 588, groups:  subid, 147; trial, 4
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2905     0.2513   1.156 0.247703    
## age.group3                     0.6393     0.3085   2.072 0.038261 *  
## age.group4                     0.9571     0.3292   2.907 0.003646 ** 
## conditionNo Label             -1.4786     0.3300  -4.481 7.43e-06 ***
## age.group3:conditionNo Label  -0.6396     0.4655  -1.374 0.169438    
## age.group4:conditionNo Label  -1.7623     0.5195  -3.392 0.000693 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ag.gr3 ag.gr4 cndtNL a.3:NL
## age.group3  -0.566                            
## age.group4  -0.528  0.446                     
## conditnNLbl -0.543  0.417  0.384              
## ag.grp3:cNL  0.375 -0.663 -0.296 -0.671       
## ag.grp4:cNL  0.333 -0.287 -0.639 -0.590  0.441

The model converges! Both condition and age group have a significant positive effect (p < .001, and p = .038 and p = .0036 respectively), and there is a significant interaction with a negative coefficient between condition and age group 4 (p < .001).

How do these coefficients compare with the independent coefficients linear model? What do you conclude?

To recall, here is what model 2 looked like:

summary(glm2)
## 
## Call:
## glm(formula = correct ~ age.group * condition + trial, family = "binomial", 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8845  -0.8221  -0.4751   0.8659   2.1155  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   0.63585    0.26690   2.382  0.01720 *  
## age.group3                    0.63728    0.29999   2.124  0.03364 *  
## age.group4                    0.95435    0.31987   2.984  0.00285 ** 
## conditionNo Label            -1.47274    0.31722  -4.643 3.44e-06 ***
## trialfaces                   -0.48536    0.27463  -1.767  0.07717 .  
## trialhouses                  -0.82902    0.27845  -2.977  0.00291 ** 
## trialpasta                   -0.07442    0.27284  -0.273  0.78502    
## age.group3:conditionNo Label -0.63728    0.45443  -1.402  0.16080    
## age.group4:conditionNo Label -1.75693    0.50702  -3.465  0.00053 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 808.59  on 587  degrees of freedom
## Residual deviance: 637.00  on 579  degrees of freedom
## AIC: 655
## 
## Number of Fisher Scoring iterations: 4

The slope coefficients of the mixed model are almost the same as those of the general linear model, with only a slight change in the intercept. In addition the results of our mixed model summary show us that very little of the variance is accounted for by random effects of subid and trial. Thus we can conclude that the fixed effects linear model fits these data fine, and we don’t need to account for random subid and trial effects.

Which random effects make the most difference? Find out using ranef. Plot the random effects for subject and item.

re = ranef(mm1)
qqnorm(re$subid[,1])
qqline(re$subid[,1])

qqnorm(re$trial[,1])
qqline(re$trial[,1])

From the plots of the random effects we can see that the random effect of tria makes a bigger difference than the random effect of subid.

Make the minimal random effects model with just a subject intecept. How does this compare?

mm2 = glmer(correct ~ age.group * condition + (1 | subid), data = d, family = 'binomial')
summary(mm2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ age.group * condition + (1 | subid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    662.7    693.4   -324.4    648.7      581 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8385 -0.5597 -0.3737  0.6337  2.6414 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subid  (Intercept) 0.02648  0.1627  
## Number of obs: 588, groups:  subid, 147
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    0.2837     0.2056   1.380 0.167603    
## age.group3                     0.6248     0.3014   2.073 0.038190 *  
## age.group4                     0.9365     0.3220   2.908 0.003634 ** 
## conditionNo Label             -1.4456     0.3223  -4.485  7.3e-06 ***
## age.group3:conditionNo Label  -0.6249     0.4561  -1.370 0.170665    
## age.group4:conditionNo Label  -1.7305     0.5104  -3.390 0.000699 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ag.gr3 ag.gr4 cndtNL a.3:NL
## age.group3  -0.675                            
## age.group4  -0.630  0.443                     
## conditnNLbl -0.646  0.419  0.387              
## ag.grp3:cNL  0.446 -0.661 -0.293 -0.675       
## ag.grp4:cNL  0.395 -0.282 -0.635 -0.593  0.438

The coefficients look almost the same as in the maximal random effects model.

Get significance values for the coefficient on the age*condition interaction using the anova, likelihood-ratio test approach, comparing between your semi-maximal model and one without an intercept.

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# No-intercept model
glm3 = glm(correct ~ age.group * condition - 1, data = d, family = 'binomial')
summary(glm3)
## 
## Call:
## glm(formula = correct ~ age.group * condition - 1, family = "binomial", 
##     data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7166  -0.7401  -0.5168   0.8250   2.0393  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## age.group2                     0.2819     0.2020   1.395 0.162902    
## age.group3                     0.9029     0.2164   4.171 3.03e-05 ***
## age.group4                     1.2130     0.2428   4.995 5.87e-07 ***
## conditionNo Label             -1.4368     0.3130  -4.590 4.43e-06 ***
## age.group3:conditionNo Label  -0.6210     0.4494  -1.382 0.167050    
## age.group4:conditionNo Label  -1.7221     0.5022  -3.429 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 815.14  on 588  degrees of freedom
## Residual deviance: 648.74  on 582  degrees of freedom
## AIC: 660.74
## 
## Number of Fisher Scoring iterations: 4
lrtest(glm3, mm1)
## Likelihood ratio test
## 
## Model 1: correct ~ age.group * condition - 1
## Model 2: correct ~ age.group * condition + (1 | subid) + (1 | trial)
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1   6 -324.37                     
## 2   8 -322.62  2 3.5047     0.1734

We see that modeling subid and trial as random effects do not significantly improve the model ( \(\chi^2\)(2) = 3.5, p = .173), therefore we are happy with the original fixed effects model.