This is problem set #3, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills and some linear modeling.
For ease of reading, please separate your answers from our text by marking our text with the
>character (indicating quotes).
Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is
rnorm). What’s the mean number of “significant” results?
First do this using a
forloop.
sig = 0
for (i in 1:10000) {
n = rnorm(30)
if(t.test(n)$p.value < .05){
sig = sig +1
}
}
sig
## [1] 445
proportion = sig/10000; proportion
## [1] 0.0445
There were 445 significant results, which makes sense since there should be a 5% false positive rate, and that is 4.45% false positive.
Next, do this using the
replicatefunction:
table(replicate(1000, t.test(rnorm(30))$p.value < .05))
##
## FALSE TRUE
## 958 42
Again, we see that around 5% of the tests are false positives.
Ok, that was a bit boring. Let’s try something moderately more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).
Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether their performance is above chance. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.
First, write a function that implements this sampling regime.
double.sample <- function () {
n = rnorm(30)
if(t.test(n)$p.value < .05){
t.test(n)$p.value
}
else if (t.test(n)$p.value < .25) {
n= n +rnorm(30)
t.test(n)$p.value
}
else {
t.test(n)$p.value
}
}
Now call this function 10k times and find out what happens.
array1 = NULL
for (i in 1:10000) {
array1[i] = double.sample()
}
length(array1[array1<.05])
## [1] 705
Is there an inflation of false positives? How bad is it?
It looks like the inflation makes the false positive rate to rise to just around .07.
Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. Let’s see what happens when you double the sample ANY time p > .05 (not just when p < .25), or when you do it only if p < .5 or < .75. How do these choices affect the false positive rate?
HINT: Try to do this by making the function
double.sampletake the upper p value as an argument, so that you can pass this through dplyr.
HINT 2: You may need more samples. Find out by looking at how the results change from run to run.
Below is the function
double.sample <- function (x) {
n = rnorm(30)
if(t.test(n)$p.value < .05){
t.test(n)$p.value
}
else if (t.test(n)$p.value < x) {
n= n +rnorm(30)
t.test(n)$p.value
}
else {
t.test(n)$p.value
}
}
Here is the false positive rate when repeating when p is less than .50
array1 = NULL
for (i in 1:10000) {
array1[i] = double.sample(.50)
}
length(array1[array1<.05])
## [1] 763
The false positive rate is around 7.8 to 8%.
Here is the false positive rate when repeating when p is less than .75
array1 = NULL
for (i in 1:10000) {
array1[i] = double.sample(.75)
}
length(array1[array1<.05])
## [1] 808
Around 8% to 8.3%.
Here is the false positive rate when doubling when p is not < .05.
array1 = NULL
for (i in 1:10000) {
array1[i] = double.sample(1)
}
length(array1[array1<.05])
## [1] 872
Around 8 to 8.6 false positive rates.
What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?
It is surprising and sad how a simple and common research heursitic like, “let’s douple the sample size” can actually increase the false positive rates to up to ~9%! This has opened my eyes to a problem.
Let’s use the
ToothGrowthdataset, on guineapig teeth based on orange juice and vitamin C. This is super simple. (Try?ToothGrowth).
First plot the data, we’ll use
qplotto understand howlen(tooth length) depends ondose(amount of Vitamin C) andsupp(delivery method).
ToothGrowth$doseF = as.factor(ToothGrowth$dose)
qplot(doseF, len,
facets = ~supp,
data=ToothGrowth)
tg.mean = ToothGrowth %>%
group_by(doseF, supp) %>%
summarise(len = mean(len))
ggplot(tg.mean, aes(x = doseF, y=len, group=supp, fill=supp))+
geom_bar(position = position_dodge(), stat = "identity")
So now you see what’s going on.
Next, always make a histogram of the DV before making a linear model! This reveals the distribution and can be helpful in choosing your model type.
qplot(ToothGrowth$len)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(len, data=ToothGrowth)+
facet_grid(~supp)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Now make a linear model of tooth lengths using
lm. Try making one with main effects and interactions and another with just main effects. Make sure to assign them to variables so that you can get them later.
mainef = lm(len ~ dose + supp, data=ToothGrowth); summary(mainef)
##
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -3.700 0.373 2.116 8.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.2725 1.2824 7.231 1.31e-09 ***
## dose 9.7636 0.8768 11.135 6.31e-16 ***
## suppVC -3.7000 1.0936 -3.383 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared: 0.7038, Adjusted R-squared: 0.6934
## F-statistic: 67.72 on 2 and 57 DF, p-value: 8.716e-16
inter = lm(len ~ dose * supp, data=ToothGrowth); summary(inter)
##
## Call:
## lm(formula = len ~ dose * supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2264 -2.8462 0.0504 2.2893 7.9386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.550 1.581 7.304 1.09e-09 ***
## dose 7.811 1.195 6.534 2.03e-08 ***
## suppVC -8.255 2.236 -3.691 0.000507 ***
## dose:suppVC 3.904 1.691 2.309 0.024631 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared: 0.7296, Adjusted R-squared: 0.7151
## F-statistic: 50.36 on 3 and 56 DF, p-value: 6.521e-16
contrasts(ToothGrowth$supp) = c(1,0)
contrasts(ToothGrowth$supp) = c(0,1)
Now try taking out the intercept, using a -1 term in the formula. what does this do?
Removing the intercept forces it to be at the origin and gives us an extra degree of freedom to measure the effect of each supplement (not relative to each other). It tells us the incrememental effect of each supplement and dose.
summary(lm(len ~ dose + supp -1, data=ToothGrowth))
##
## Call:
## lm(formula = len ~ dose + supp - 1, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -3.700 0.373 2.116 8.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## dose 9.7636 0.8768 11.135 6.31e-16 ***
## suppOJ 9.2725 1.2824 7.231 1.31e-09 ***
## suppVC 5.5725 1.2824 4.345 5.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared: 0.9586, Adjusted R-squared: 0.9564
## F-statistic: 439.7 on 3 and 57 DF, p-value: < 2.2e-16
summary(lm(len ~ dose * supp -1, data=ToothGrowth))
##
## Call:
## lm(formula = len ~ dose * supp - 1, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2264 -2.8463 0.0504 2.2893 7.9386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## dose 7.811 1.195 6.534 2.03e-08 ***
## suppOJ 11.550 1.581 7.304 1.09e-09 ***
## suppVC 3.295 1.581 2.084 0.0418 *
## dose:supp1 3.904 1.691 2.309 0.0246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared: 0.9622, Adjusted R-squared: 0.9595
## F-statistic: 356.2 on 4 and 56 DF, p-value: < 2.2e-16
Thought question: Take a moment to interpret the coefficients of the model. Q1 - What are the units?
For dose: The units are increase in length of tooth per 1 milligram increase of dose. When there is an intercept and interaction, this is interpreted at the supplement that is dummy coded 0.
For supplement: the units are increase in length of tooth compared to the dummy coded 0 supplement at dose = 0. However, when there is no intercept, we look at the effects assuming an intercept of 0. for example, when we are not controlling for dose, each beta represents the mean tooth length for each supplement.
Q2 - How does the interaction relate to the plot?
the interaction suggests that the effect of dose on tooth length is depedent on supplement.
Q3 - Should there be an interaction in the model? What does it mean? How important is it?
Looking at the graph, yes there should be an interaction. It looks like the effect of dose matters more for the VC supplement. And the model confirms that the interaction is useful, since it is signifcant. Looking at simple effects, the effect of dose is indeed stronger (b = 11.72) for VC compared to OJ (b = 7.81).
Now make predictions from the model you like the best. What should happen with doses of 0, 1.5, 2.5, and 10 under both supplements?
HINT: use the
predictfunction …
HINT 2: you will have to make a dataframe to do the prediction with, so use something like
data.frame(dose=...).
mod1 = lm(len ~ dose*supp, data=ToothGrowth)
m1 = data.frame(dose = cbind(c(0,1.5,2.5,10,0,1.5,2.5,10)), supp=c(rep("OJ",4), rep("VC",4)))
m1$pred = predict(mod1, m1)
Now plot the residuals from the original model. How do they look? HINT:
?resid
qplot(dose, resid(mod1),
facets = ~supp,
data=ToothGrowth)
It looks like the residuals are not even based on dosage. Specifically, for both supplements, the model underestimates dosage 1. this makes me think that mayne the main effect of dose is curvilinear. Let’s check.
summary(lm(len ~ poly(dose,2) * supp, data=ToothGrowth))
##
## Call:
## lm(formula = len ~ poly(dose, 2) * supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.20 -2.72 -0.27 2.65 8.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.6633 0.6630 31.166 < 2e-16 ***
## poly(dose, 2)1 37.7328 5.1356 7.347 1.12e-09 ***
## poly(dose, 2)2 -18.6217 5.1356 -3.626 0.000638 ***
## supp1 -3.7000 0.9376 -3.946 0.000231 ***
## poly(dose, 2)1:supp1 18.8595 7.2628 2.597 0.012101 *
## poly(dose, 2)2:supp1 8.8088 7.2628 1.213 0.230460
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
## F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
Yep!
What the heck is going on? Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (
Faces_Medium, in which kids played on a white background, andFaces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). Forgive our bad naming conventions.
Try to figure out what the most reasonable linear model of the data is.
d <- read.csv("http://langcog.stanford.edu/FVS2011-hands.csv")
Plot that model on the same plot as the data.
HINT: you can do this either using
predictor (if you are feeling confident of your understanding of the models) using the built-in linear models inggplot’sgeom_smooth.
contrasts(d$condition)
## Faces_Plus
## Faces_Medium 0
## Faces_Plus 1
summary(lm(hand.look ~ age*condition, data=d))
##
## Call:
## lm(formula = hand.look ~ age * condition, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.190121 -0.036508 -0.001239 0.032091 0.239474
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.022394 0.015417 1.453 0.14772
## age 0.003143 0.001176 2.673 0.00807 **
## conditionFaces_Plus -0.028440 0.021680 -1.312 0.19089
## age:conditionFaces_Plus 0.003950 0.001653 2.389 0.01770 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05456 on 228 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1764
## F-statistic: 17.49 on 3 and 228 DF, p-value: 2.955e-10
summary(lm(hand.look ~ poly(age,3)*condition, data=d))
##
## Call:
## lm(formula = hand.look ~ poly(age, 3) * condition, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.169897 -0.041130 -0.001441 0.030575 0.222312
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.061331 0.004793 12.795 < 2e-16
## poly(age, 3)1 0.208384 0.073444 2.837 0.004967
## poly(age, 3)2 0.080984 0.073214 1.106 0.269858
## poly(age, 3)3 -0.040238 0.072934 -0.552 0.581700
## conditionFaces_Plus 0.020477 0.006779 3.021 0.002814
## poly(age, 3)1:conditionFaces_Plus 0.259394 0.103260 2.512 0.012708
## poly(age, 3)2:conditionFaces_Plus -0.141970 0.103258 -1.375 0.170535
## poly(age, 3)3:conditionFaces_Plus -0.349895 0.103248 -3.389 0.000829
##
## (Intercept) ***
## poly(age, 3)1 **
## poly(age, 3)2
## poly(age, 3)3
## conditionFaces_Plus **
## poly(age, 3)1:conditionFaces_Plus *
## poly(age, 3)2:conditionFaces_Plus
## poly(age, 3)3:conditionFaces_Plus ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05162 on 224 degrees of freedom
## Multiple R-squared: 0.2851, Adjusted R-squared: 0.2627
## F-statistic: 12.76 on 7 and 224 DF, p-value: 9.09e-14
ggplot(d, aes(x= age, y = hand.look, group=condition, color=condition)) +
geom_point() +
geom_smooth(method="lm", formula = y ~ x)
What do you conclude from this pattern of data?
It looks like there is an interaction of condition and age on looking time for hands. Specifically, it looks like children are always more likely to look at hands as they get older, but this effect is more pronounced for the condition labeled “faces_plus.”
The goal here is to learn to use LMEMs using
lme4and to compare them to standard by subject, by item LMs, as well as the standard (no repeated measures) fixed effects GLM.
The dataset here is from Stiller, Goodman, & Frank (2014), a paper on children’s pragmatic inferences. We saw the paradigm in the counterbalancing lecture: it’s three faces: a smiley, a smiley with glasses, and a smiley with a hat and glasses. When told “my friend has glasses” do kids pick the one with the glasses and no hat?
age.groupis the kids’ age group,conditionis either “label,” described above, or “no label,” which was a control condition in which kids picked without hearing the term “glasses” at all.
d <- read.csv("http://langcog.stanford.edu/scales_data.csv")
d$age.group <- factor(d$age.group)
Always begin with a histogram!
qplot(correct, data=d)+
facet_grid(~age.group)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(correct, data=d)+
facet_grid(~condition)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Start out by setting up a function for a 95% CI using the normal approximation.
ci95.norm <- function(x) {
sd(x, na.rm=TRUE) / sqrt(length(x)) * qnorm(.975)
}
But the number of participants in a group is likely to be < 30, so let’s also compute this with a t distribution.
ci95.t <- function(x) {
sd(x, na.rm=TRUE) / sqrt(length(x)) * qt(.975,length(x)-1)
}
On the other hand, maybe we should use bootstrap CIs because these are actually proportions, and the normal/t approximations don’t know that they are 0/1 bounded.
library(boot)
library(bootstrap)
We’re going to use it to check some code that I use:
theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}
ci.low <- function(x,na.rm=T) {
mean(x,na.rm=na.rm) -
quantile(bootstrap(1:length(x),
10000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)}
ci.high <- function(x,na.rm=T) {
quantile(bootstrap(1:length(x),
10000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) -
mean(x,na.rm=na.rm)}
Now make 95% confidence intervals across participants using all the methods above:
- Normal
- t
- Bootstrap percentile using my code
d.prop = d %>%
group_by(subid, condition, age.group) %>%
summarise(prop = mean(correct))
ci95.norm(d.prop$prop)
## [1] 0.05377146
ci95.t(d.prop$prop)
## [1] 0.05422089
ci.low(d.prop$prop)
## 2.5%
## 0.05272109
ci.high(d.prop$prop)
## 97.5%
## 0.05442177
Now plot the data by age and condition using
dplyrandggplot2. Plot some CIs on here - extra credit if you plot all of them and compare visually (you’ll needposition = position_dodge()or some other way to offset them).
d.mean= d.prop %>%
group_by(age.group, condition) %>%
summarise(mean = mean(prop), ci.t = ci95.t(prop), ci.n = ci95.norm(prop), ci.l= ci.low(prop), ci.h =ci.high(prop))
CI with t distribution
ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
geom_bar(position = position_dodge(), stat = "identity") +
geom_errorbar(aes(ymin=mean-ci.t, ymax=mean+ci.t),
width=.2,
position=position_dodge(.9))
CI with normal distribution
ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
geom_bar(position = position_dodge(), stat = "identity") +
geom_errorbar(aes(ymin=mean-ci.n, ymax=mean+ci.n),
width=.2,
position=position_dodge(.9))
CI with bootstrapping
ggplot(d.mean, aes(x = age.group, y = mean, fill =condition )) +
geom_bar(position = position_dodge(), stat = "identity") +
geom_errorbar(aes(ymin=mean-ci.l, ymax=mean+ci.h),
width=.2,
position=position_dodge(.9))
What do you conclude about confidence interval computation?
I conclude that it doesn’t really matter :)
OK, now do a basic GLM over the entire data frame, including
age.groupandconditionand their interaction. (If we were focusing on developmental issues, I would ask you to think about how to model age here, but let’s treat it as three discrete groups for now).
NOTE: this model is not appropriate, because it assumes that each subject’s observations are independent from one another. It’s still fine to do the analysis, though: it can tell you a lot about the data and is easy and fast to fit, as long as you know that you can’t trust the p-values!
summary(glm(correct ~ condition * age.group, family = "binomial", data=d ))
##
## Call:
## glm(formula = correct ~ condition * age.group, family = "binomial",
## data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7166 -0.7401 -0.5168 0.8250 2.0393
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2819 0.2020 1.395 0.162902
## conditionNo Label -1.4368 0.3130 -4.590 4.43e-06 ***
## age.group3 0.6210 0.2961 2.098 0.035935 *
## age.group4 0.9312 0.3159 2.948 0.003198 **
## conditionNo Label:age.group3 -0.6210 0.4494 -1.382 0.167050
## conditionNo Label:age.group4 -1.7221 0.5022 -3.429 0.000605 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 808.59 on 587 degrees of freedom
## Residual deviance: 648.74 on 582 degrees of freedom
## AIC: 660.74
##
## Number of Fisher Scoring iterations: 4
Let’s now use
dplyrto get data frames for by-items and by-subjects analyses. These should contain means for age and condition across items.
d.prop2 = d %>%
group_by(trial, condition, age.group) %>%
summarise(prop = mean(correct))
Now do standard linear models on each of these.
NOTE: These are not strictly correct either because of the normal approximation on percent correct (model doesn’t know it’s 0 - 1 bounded and could give you standard error that goes above 1). Again, useful to do and see what happens.
summary(lm(prop ~ condition*age.group, data=d.prop2))
##
## Call:
## lm(formula = prop ~ condition * age.group, data = d.prop2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.21154 -0.07719 0.01042 0.07333 0.21154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.57000 0.05485 10.392 4.93e-09 ***
## conditionNo Label -0.33042 0.07757 -4.260 0.000471 ***
## age.group3 0.14154 0.07757 1.825 0.084691 .
## age.group4 0.20083 0.07757 2.589 0.018518 *
## conditionNo Label:age.group3 -0.14154 0.10970 -1.290 0.213292
## conditionNo Label:age.group4 -0.31542 0.10970 -2.875 0.010068 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1097 on 18 degrees of freedom
## Multiple R-squared: 0.8752, Adjusted R-squared: 0.8405
## F-statistic: 25.24 on 5 and 18 DF, p-value: 1.52e-07
summary(lm(prop ~ condition*age.group, data=d.prop))
##
## Call:
## lm(formula = prop ~ condition * age.group, data = d.prop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57000 -0.21154 0.01042 0.18000 0.51042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.57000 0.04420 12.897 < 2e-16 ***
## conditionNo Label -0.33042 0.06315 -5.232 5.95e-07 ***
## age.group3 0.14154 0.06190 2.287 0.023714 *
## age.group4 0.20083 0.06315 3.180 0.001810 **
## conditionNo Label:age.group3 -0.14154 0.08889 -1.592 0.113560
## conditionNo Label:age.group4 -0.31542 0.08977 -3.514 0.000594 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.221 on 141 degrees of freedom
## Multiple R-squared: 0.5737, Adjusted R-squared: 0.5586
## F-statistic: 37.96 on 5 and 141 DF, p-value: < 2.2e-16
Do ANOVA on these. Note that ANOVA doesn’t let you figure out what is going on with respect to age.
summary(aov(prop ~ condition*age.group, data=d.prop2))
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 1.3982 1.3982 116.190 2.78e-09 ***
## age.group 2 0.0204 0.0102 0.846 0.446
## condition:age.group 2 0.0998 0.0499 4.148 0.033 *
## Residuals 18 0.2166 0.0120
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
On to linear mixed effect models. Create the maximal random effects model a la Barr et al. (2013). Does it converge? If not, what will you do to make it converge? (The internet can be your friend here).
HINT: make sure that you consider which random effects are appropriate. Consider which observations are within/between subjects. E.g. having a random coefficient for age by subject doesn’t make sense, because each subject has only one age.
summary(glmer(correct ~ condition*age.group + (condition*age.group|trial) + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## correct ~ condition * age.group + (condition * age.group | trial) +
## (1 | subid)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 688.2 810.8 -316.1 632.2 560
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6159 -0.5593 -0.2649 0.6724 3.5931
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subid (Intercept) 0.10293 0.3208
## trial (Intercept) 0.01422 0.1193
## conditionNo Label 0.00984 0.0992 1.00
## age.group3 0.49426 0.7030 1.00 1.00
## age.group4 0.11935 0.3455 1.00 1.00 1.00
## conditionNo Label:age.group3 1.48553 1.2188 -1.00 -1.00 -1.00
## conditionNo Label:age.group4 0.08097 0.2846 1.00 1.00 1.00
##
##
##
##
##
##
## -1.00
## 1.00 -1.00
## Number of obs: 588, groups: subid, 147; trial, 4
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2897 0.2232 1.298 0.194266
## conditionNo Label -1.4827 0.3410 -4.349 1.37e-05 ***
## age.group3 0.7672 0.4854 1.581 0.113983
## age.group4 1.0085 0.3845 2.623 0.008717 **
## conditionNo Label:age.group3 -0.7789 0.7826 -0.995 0.319592
## conditionNo Label:age.group4 -2.0467 0.6043 -3.387 0.000706 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.577
## age.group3 -0.228 0.372
## age.group4 -0.411 0.397 0.581
## cndtnNLb:.3 0.053 -0.517 -0.860 -0.509
## cndtnNLb:.4 0.398 -0.478 0.013 -0.405 0.049
This doesn’t converge. I am going to limit my model to random effects that make sense to me. Specifically, I don’t see any reason to expect the random effect of trial to vary based on condition or age or its interaction, so I am going to reduce this model to two random intercepts.
summary(glmer(correct ~ condition*age.group + (1|trial) + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ condition * age.group + (1 | trial) + (1 | subid)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 661.2 696.2 -322.6 645.2 580
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0905 -0.6050 -0.3480 0.6322 2.7987
##
## Random effects:
## Groups Name Variance Std.Dev.
## subid (Intercept) 0.05388 0.2321
## trial (Intercept) 0.07494 0.2737
## Number of obs: 588, groups: subid, 147; trial, 4
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2905 0.2513 1.156 0.247698
## conditionNo Label -1.4786 0.3300 -4.481 7.43e-06 ***
## age.group3 0.6393 0.3085 2.072 0.038261 *
## age.group4 0.9571 0.3292 2.907 0.003646 **
## conditionNo Label:age.group3 -0.6396 0.4655 -1.374 0.169441
## conditionNo Label:age.group4 -1.7622 0.5195 -3.392 0.000693 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.543
## age.group3 -0.566 0.417
## age.group4 -0.528 0.384 0.446
## cndtnNLb:.3 0.375 -0.671 -0.663 -0.296
## cndtnNLb:.4 0.333 -0.590 -0.287 -0.639 0.441
How do these coefficients compare with the independent coefficients linear model? What do you conclude?
It looks like these models are around the same, so there is an interaction of condition and age group and there seems to be somewhat little variance due to subject or trial (surprisingly!)
Which random effects make the most difference? Find out using
ranef. Plot the random effects for subject and item.
mod2 = glmer(correct ~ condition*age.group + (1|trial), family = "binomial", data=d)
mod3 = glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d)
plot(ranef(mod2))
## $trial
plot(ranef(mod3))
## $subid
It looks like the trial random effect is stronger. I concluded this by looking at the two intercept model and seeing that the trial intercept explained more variance. Plus, the random effects range is larger in the plots for trial (.2 to -.3) than for subject id (-.06 to .04).
Make the minimal random effects model with just a subject intecept. How does this compare?
summary(glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ condition * age.group + (1 | subid)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 662.7 693.4 -324.4 648.7 581
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8385 -0.5597 -0.3737 0.6337 2.6414
##
## Random effects:
## Groups Name Variance Std.Dev.
## subid (Intercept) 0.02648 0.1627
## Number of obs: 588, groups: subid, 147
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2837 0.2056 1.380 0.167593
## conditionNo Label -1.4456 0.3223 -4.485 7.3e-06 ***
## age.group3 0.6248 0.3014 2.073 0.038187 *
## age.group4 0.9365 0.3220 2.908 0.003633 **
## conditionNo Label:age.group3 -0.6249 0.4561 -1.370 0.170658
## conditionNo Label:age.group4 -1.7305 0.5104 -3.390 0.000698 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtNL ag.gr3 ag.gr4 cNL:.3
## conditnNLbl -0.646
## age.group3 -0.675 0.419
## age.group4 -0.630 0.387 0.443
## cndtnNLb:.3 0.446 -0.675 -0.661 -0.293
## cndtnNLb:.4 0.395 -0.593 -0.282 -0.635 0.438
All of these look surprisingly alike.
Get significance values for the coefficient on the age*condition interaction using the anova, likelihood-ratio test approach, comparing between your semi-maximal model and one without an intercept.
#semimaximal model
mod1 = glmer(correct ~ condition*age.group + (1|subid) + (1|trial), family = "binomial", data=d)
mod2 = glmer(correct ~ condition*age.group + (1|trial), family = "binomial", data=d)
mod3 = glmer(correct ~ condition*age.group + (1|subid), family = "binomial", data=d)
anova(mod1, mod2) # mod1 is not better than mod2
## Data: d
## Models:
## mod2: correct ~ condition * age.group + (1 | trial)
## mod1: correct ~ condition * age.group + (1 | subid) + (1 | trial)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## mod2 7 659.33 689.97 -322.67 645.33
## mod1 8 661.24 696.25 -322.62 645.24 0.0958 1 0.757
anova(mod1, mod3) #mod 2 is better than mod3
## Data: d
## Models:
## mod3: correct ~ condition * age.group + (1 | subid)
## mod1: correct ~ condition * age.group + (1 | subid) + (1 | trial)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## mod3 7 662.72 693.35 -324.36 648.72
## mod1 8 661.24 696.25 -322.62 645.24 3.4802 1 0.06211 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Looks like including random intercept of subject and trial is not necessary, but just trial does improve the model. thus, this leads me to the conclusion that the random effect of trial is the only one we should include in the model. That being said, the effects always look pretty similar, and that is that participants are correct more often in the label condition, and that difference is more significant among the age group of four relative to the age group of 2. Let me look at the linear effect using contrast coding:
contrasts(d$age.group) = cbind(c(-1,0,1), c(0,2,0))
summary(glmer(correct ~ condition*age.group + (1|trial), family = "binomial", data=d))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ condition * age.group + (1 | trial)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 659.3 690.0 -322.7 645.3 581
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0761 -0.6082 -0.3562 0.6320 2.8071
##
## Random effects:
## Groups Name Variance Std.Dev.
## trial (Intercept) 0.0728 0.2698
## Number of obs: 588, groups: trial, 4
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.75958 0.20885 3.637 0.000276 ***
## conditionNo Label -2.33294 0.25413 -9.180 < 2e-16 ***
## age.group1 0.47297 0.15924 2.970 0.002976 **
## age.group2 0.07922 0.13499 0.587 0.557305
## conditionNo Label:age.group1 -0.87246 0.25272 -3.452 0.000556 ***
## conditionNo Label:age.group2 0.12052 0.20570 0.586 0.557943
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtNL ag.gr1 ag.gr2 cNL:.1
## conditnNLbl -0.482
## age.group1 0.139 -0.117
## age.group2 -0.449 0.368 -0.106
## cndtnNLb:.1 -0.088 0.223 -0.631 0.066
## cndtnNLb:.2 0.295 -0.610 0.070 -0.656 -0.135
Yes indeed, there is a significant difference in the linear effect. Looking at the graph, this can again be interpreted as kids get more correct as they get older, but only for the label condition, meaning they are learning to make inferences!