This is problem set #3, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills and some linear modeling.
For ease of reading, please separate your answers from our text by marking our text with the
>character (indicating quotes).
options(warn=-1)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is
rnorm). What’s the mean number of “significant” results?
First do this using a
forloop.
numSig = 0
for (i in 1:10000) {
pval = t.test(rnorm(30))$p.value
if (pval < .05) {
numSig = numSig + 1
}
}
print(numSig)
## [1] 486
Mean number of significant results = 486, i.e. false positive rate is 0.0486, which is close to what we expect.
Next, do this using the
replicatefunction:
pvals = replicate(10000, t.test(rnorm(30))$p.value)
numSig = sum(pvals < .05)
print(numSig)
## [1] 496
Mean number of significant results = 496, i.e. false positive rate is 0.0496, which is once again close to what we expect.
Ok, that was a bit boring. Let’s try something moderately more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).
Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether their performance is above chance. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.
First, write a function that implements this sampling regime.
double.sample <- function () {
data = rnorm(30)
pval = t.test(data)$p.value
if (pval > .05 & pval < .25) {
data = c(data, rnorm(30))
pval = t.test(data)$p.value
}
return(pval)
}
Now call this function 10k times and find out what happens.
pvals = replicate(10000, double.sample())
numSig = sum(pvals < .05)
print(numSig)
## [1] 720
Is there an inflation of false positives? How bad is it?
Mean number of significant results = 720, i.e. false positive rate is 0.072, which is more than what it should be. There is a noticeable inflation of the false positive rate. Comparing with .05 as our standard, the false positive rate inflated by a factor of 1.44.
Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. Let’s see what happens when you double the sample ANY time p > .05 (not just when p < .25), or when you do it only if p < .5 or < .75. How do these choices affect the false positive rate?
HINT: Try to do this by making the function
double.sampletake the upper p value as an argument, so that you can pass this through dplyr.
HINT 2: You may need more samples. Find out by looking at how the results change from run to run.
double.sample <- function (upper_p) {
data = rnorm(30)
pval = t.test(data)$p.value
if (pval > .05 & pval < upper_p) {
data = c(data, rnorm(30))
pval = t.test(data)$p.value
}
return(pval)
}
# Using a larger number of samples to get more accurate results
nsamples = 20000
# Trying upper p-value of .75
pvals = replicate(nsamples, double.sample(0.75))
numSig1 = sum(pvals < .05)
print(numSig1)
## [1] 1686
# Trying upper p-value of 1
pvals = replicate(nsamples, double.sample(1))
numSig2 = sum(pvals < .05)
print(numSig2)
## [1] 1698
What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?
For doubling when .05 < p < .75, the false positive rate is 0.0843. For doubling when .05 < p < 1, the false positive rate is 0.0849. Once again there is a noticeable inflation of the p-value, so we can conclude that this data-dependent policy is not good.
Let’s use the
ToothGrowthdataset, on guineapig teeth based on orange juice and vitamin C. This is super simple. (Try?ToothGrowth).
First plot the data, we’ll use
qplotto understand howlen(tooth length) depends ondose(amount of Vitamin C) andsupp(delivery method).
qplot(dose, y = len, color = supp, data = ToothGrowth)
So now you see what’s going on.
Next, always make a histogram of the DV before making a linear model! This reveals the distribution and can be helpful in choosing your model type.
# Possible histograms for the ToothGrowth data
hists = function() {
h1 = qplot(len, fill = supp, geom = 'histogram', data = ToothGrowth)
h2 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth)
h3 = qplot(len, fill = supp, geom = 'histogram', data = ToothGrowth) + facet_grid(dose ~ .)
h4 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth) + facet_grid(supp ~ .)
h5 = qplot(len, fill = factor(dose), geom = 'histogram', data = ToothGrowth) + facet_grid(supp ~ dose)
grid.arrange(h1, h2, ncol = 1)
print(h3)
print(h4)
print(h5)
}
suppressMessages(hists())
Now make a linear model of tooth lengths using
lm. Try making one with main effects and interactions and another with just main effects. Make sure to assign them to variables so that you can get them later.
# Main effects only (additive model)
lm1 = lm(len ~ dose + supp, data = ToothGrowth)
summary(lm1)
##
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -3.700 0.373 2.116 8.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.2725 1.2824 7.231 1.31e-09 ***
## dose 9.7636 0.8768 11.135 6.31e-16 ***
## suppVC -3.7000 1.0936 -3.383 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared: 0.7038, Adjusted R-squared: 0.6934
## F-statistic: 67.72 on 2 and 57 DF, p-value: 8.716e-16
# Main effects and interaction effects (multiplicative model)
lm2 = lm(len ~ dose * supp, data = ToothGrowth)
summary(lm2)
##
## Call:
## lm(formula = len ~ dose * supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2264 -2.8462 0.0504 2.2893 7.9386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.550 1.581 7.304 1.09e-09 ***
## dose 7.811 1.195 6.534 2.03e-08 ***
## suppVC -8.255 2.236 -3.691 0.000507 ***
## dose:suppVC 3.904 1.691 2.309 0.024631 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared: 0.7296, Adjusted R-squared: 0.7151
## F-statistic: 50.36 on 3 and 56 DF, p-value: 6.521e-16
Now try taking out the intercept, using a -1 term in the formula. what does this do?
lm3 = lm(len ~ dose + supp - 1, data = ToothGrowth)
summary(lm3)
##
## Call:
## lm(formula = len ~ dose + supp - 1, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.600 -3.700 0.373 2.116 8.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## dose 9.7636 0.8768 11.135 6.31e-16 ***
## suppOJ 9.2725 1.2824 7.231 1.31e-09 ***
## suppVC 5.5725 1.2824 4.345 5.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.236 on 57 degrees of freedom
## Multiple R-squared: 0.9586, Adjusted R-squared: 0.9564
## F-statistic: 439.7 on 3 and 57 DF, p-value: < 2.2e-16
lm4 = lm(len ~ dose * supp - 1, data = ToothGrowth)
summary(lm4)
##
## Call:
## lm(formula = len ~ dose * supp - 1, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2264 -2.8463 0.0504 2.2893 7.9386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## dose 7.811 1.195 6.534 2.03e-08 ***
## suppOJ 11.550 1.581 7.304 1.09e-09 ***
## suppVC 3.295 1.581 2.084 0.0418 *
## dose:suppVC 3.904 1.691 2.309 0.0246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.083 on 56 degrees of freedom
## Multiple R-squared: 0.9622, Adjusted R-squared: 0.9595
## F-statistic: 356.2 on 4 and 56 DF, p-value: < 2.2e-16
When we remove the intercept, we have both OJ and VC appear as variables in our linear model, instead of just one of them. This is because the intercept serves as our baseline for comparison of one to the other; without the intercept we need to compensate for this by including both in the model.
Thought question: Take a moment to interpret the coefficients of the model. Q1 - What are the units?
Pretending that tooth length is measured in nanometers (wasn’t specified), then units of model = nm / (mg / day)
Q2 - How does the interaction relate to the plot?
The interaction explains why the points for OJ and VC lend themselves to best-fit lines that have different slopes, i.e. len increases with respect to daily dose at different rates for our two supps.
Q3 - Should there be an interaction in the model? What does it mean? How important is it?
Yes there should be an interaction in the model, and it is important because it is statistically significant (t = 2.309, p = .0246). This means that supp has a significant impact on the rate of increase of tooth length with respect to the daily dose quantity, and therefore should be taken into consideration in our model.
Now make predictions from the model you like the best. What should happen with doses of 0, 1.5, 2.5, and 10 under both supplements?
HINT: use the
predictfunction …
HINT 2: you will have to make a dataframe to do the prediction with, so use something like
data.frame(dose=...).
# An idea from http://stats.stackexchange.com/questions/6684/how-can-one-use-the-predict-function-on-a-lm-object-where-the-ivs-have-been-dyna
dosage = c(0, 1.5, 2.5, 10)
VC = expand.grid(dose = dosage, supp = 'VC')
OJ = expand.grid(dose = dosage, supp = 'OJ')
data.frame(dose = dosage, lenVC_prediction = predict(lm2, VC), lenOJ_prediction = predict(lm2, OJ))
## dose lenVC_prediction lenOJ_prediction
## 1 0.0 3.29500 11.55000
## 2 1.5 20.86857 23.26714
## 3 2.5 32.58429 31.07857
## 4 10.0 120.45214 89.66429
Now plot the residuals from the original model. How do they look? HINT:
?resid
qplot(resid(lm2), fill = ToothGrowth$supp)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The residuals are for the most part centered around 0, but there is some slight skewing taking place (VC is slightly skewed to the right of 0, OJ is slightly skewed to the left). The distribution of the residuals isn’t perfect.
BONUS: test them for normality of distribution using a quantile-quantile plot.
HINT:
?qqplotand?qqnorm
qqnorm(resid(lm2), xlab = 'Normal Scores', ylab = 'Residuals')
qqline(resid(lm2))
Distribution of residuals exhibit a decent fit with standard normal distribution of residuals.
What the heck is going on? Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (
Faces_Medium, in which kids played on a white background, andFaces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). Forgive our bad naming conventions.
Try to figure out what the most reasonable linear model of the data is.
d <- read.csv("http://langcog.stanford.edu/FVS2011-hands.csv")
# Try both the additive and interactive model, and see which one fits better
lm1 = lm(hand.look ~ age + condition, data = d)
lm2 = lm(hand.look ~ age * condition, data = d)
summary(lm1)
##
## Call:
## lm(formula = hand.look ~ age + condition, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.159908 -0.035120 -0.006159 0.028734 0.246199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0023491 0.0115374 -0.204 0.83884
## age 0.0051414 0.0008351 6.156 3.3e-09 ***
## conditionFaces_Plus 0.0204451 0.0072375 2.825 0.00515 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05512 on 229 degrees of freedom
## Multiple R-squared: 0.1667, Adjusted R-squared: 0.1595
## F-statistic: 22.91 on 2 and 229 DF, p-value: 8.488e-10
summary(lm2)
##
## Call:
## lm(formula = hand.look ~ age * condition, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.190121 -0.036508 -0.001239 0.032091 0.239474
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.022394 0.015417 1.453 0.14772
## age 0.003143 0.001176 2.673 0.00807 **
## conditionFaces_Plus -0.028440 0.021680 -1.312 0.19089
## age:conditionFaces_Plus 0.003950 0.001653 2.389 0.01770 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05456 on 228 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1764
## F-statistic: 17.49 on 3 and 228 DF, p-value: 2.955e-10
anova(lm1, lm2)
## Analysis of Variance Table
##
## Model 1: hand.look ~ age + condition
## Model 2: hand.look ~ age * condition
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 229 0.69573
## 2 228 0.67874 1 0.016992 5.7079 0.0177 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We will use the interactive model because it fits the data better.
Plot that model on the same plot as the data.
HINT: you can do this either using
predictor (if you are feeling confident of your understanding of the models) using the built-in linear models inggplot’sgeom_smooth.
ggplot(d, aes(x = age, y = hand.look, col = condition)) +
geom_jitter() +
geom_smooth(method = 'lm')
What do you conclude from this pattern of data?
We can conclude that age has a linear positive impact on infant’s hand-looking, and the degree of this increase is impacted by which movie condition the infants are in. The FacesPlus condition results in a higher rate of increase of hand.look with respect to age than does the FacesMedium contition That is to say that the interaction between age and movie condition is significant.
The goal here is to learn to use LMEMs using
lme4and to compare them to standard by subject, by item LMs, as well as the standard (no repeated measures) fixed effects GLM.
The dataset here is from Stiller, Goodman, & Frank (2014), a paper on children’s pragmatic inferences. We saw the paradigm in the counterbalancing lecture: it’s three faces: a smiley, a smiley with glasses, and a smiley with a hat and glasses. When told “my friend has glasses” do kids pick the one with the glasses and no hat?
age.groupis the kids’ age group,conditionis either “label,” described above, or “no label,” which was a control condition in which kids picked without hearing the term “glasses” at all.
library(lme4)
## Loading required package: Matrix
library(lmerTest)
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
d <- read.csv("http://langcog.stanford.edu/scales_data.csv")
d$age.group <- factor(d$age.group)
Always begin with a histogram!
ggplot(d, aes(x = correct, fill = trial)) +
geom_histogram(binwidth = .5, position = 'dodge') +
facet_grid(age.group ~ condition)
Start out by setting up a function for a 95% CI using the normal approximation.
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95.norm <- function(x) {
sem(x) * 1.96
}
But the number of participants in a group is likely to be < 30, so let’s also compute this with a t distribution.
ci95.t <- function(x) {
sem(x) * qt(.975, df = length(x))
}
On the other hand, maybe we should use bootstrap CIs because these are actually proportions, and the normal/t approximations don’t know that they are 0/1 bounded.
library(boot)
library(bootstrap)
Take a look at
?bootand?bootci. Note that the syntax for thebootlibrary is terrible, so we’re going to use it to check some code that I use:
theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}
ci.low <- function(x,na.rm=T) {
mean(x,na.rm=na.rm) -
quantile(bootstrap(1:length(x),
10000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)}
ci.high <- function(x,na.rm=T) {
quantile(bootstrap(1:length(x),
10000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) -
mean(x,na.rm=na.rm)}
Now make 95% confidence intervals across participants using all the methods above:
- Normal
- t
- Bootstrap percentile using
boot.ci- Bootstrap percentile using my code
# boot.ci example from http://www.r-bloggers.com/bootstrap-confidence-intervals/
d.cis = d %>%
group_by(age.group, condition) %>%
summarize(
mean = mean(correct),
ci95.norm = ci95.norm(correct),
ci95.t = ci95.t(correct),
bootLow = boot.ci(boot(correct, function(x, i) mean(x[i]), R = 999))$basic[4],
bootHigh = boot.ci(boot(correct, function(x, i) mean(x[i]), R = 999))$basic[5],
codeLow = ci.low(correct),
codeHigh = ci.high(correct)
)
print(d.cis)
## Source: local data frame [6 x 9]
## Groups: age.group [?]
##
## age.group condition mean ci95.norm ci95.t bootLow bootHigh
## (fctr) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1 2 Label 0.5700000 0.09752369 0.09871644 0.48000000 0.6700000
## 2 2 No Label 0.2395833 0.08583183 0.08692594 0.14583333 0.3229167
## 3 3 Label 0.7115385 0.08749447 0.08852287 0.62500000 0.8076923
## 4 3 No Label 0.2395833 0.08583183 0.08692594 0.15625000 0.3229167
## 5 4 Label 0.7708333 0.08451820 0.08559556 0.68750000 0.8645833
## 6 4 No Label 0.1250000 0.06650485 0.06735259 0.05208333 0.1875000
## Variables not shown: codeLow (dbl), codeHigh (dbl)
Now plot the data by age and condition using
dplyrandggplot2. Plot some CIs on here - extra credit if you plot all of them and compare visually (you’ll needposition = position_dodge()or some other way to offset them).
ggplot(d.cis, aes(age.group, mean, fill = condition)) +
geom_bar(position = 'dodge', stat = 'identity') +
geom_errorbar(aes(ymin = mean - ci95.norm, ymax = mean + ci95.norm), width = .25, position = position_dodge(.1)) +
geom_errorbar(aes(ymin = mean - ci95.t, ymax = mean + ci95.t), width = .25, position = position_dodge(.5)) +
geom_errorbar(aes(ymin = bootLow, ymax = bootHigh), width = .25, position = position_dodge(1.0)) +
geom_errorbar(aes(ymin = mean - codeLow, ymax = mean + codeHigh), width = .25, position = position_dodge(1.5)) +
labs(x = 'Age Group', y = 'Mean Correct') +
scale_fill_discrete(name = 'Condition')
What do you conclude about confidence interval computation?
The four methods of computing confidence intervals used here seem to be within very close proximity to one another in terms of precision.
OK, now do a basic GLM over the entire data frame, including
age.groupandconditionand their interaction. (If we were focusing on developmental issues, I would ask you to think about how to model age here, but let’s treat it as three discrete groups for now).
NOTE: this model is not appropriate, because it assumes that each subject’s observations are independent from one another. It’s still fine to do the analysis, though: it can tell you a lot about the data and is easy and fast to fit, as long as you know that you can’t trust the p-values!
glm1 = glm(correct ~ age.group * condition, data = d, family = 'binomial')
summary(glm1)
##
## Call:
## glm(formula = correct ~ age.group * condition, family = "binomial",
## data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7166 -0.7401 -0.5168 0.8250 2.0393
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2819 0.2020 1.395 0.162902
## age.group3 0.6210 0.2961 2.098 0.035935 *
## age.group4 0.9312 0.3159 2.948 0.003198 **
## conditionNo Label -1.4368 0.3130 -4.590 4.43e-06 ***
## age.group3:conditionNo Label -0.6210 0.4494 -1.382 0.167050
## age.group4:conditionNo Label -1.7221 0.5022 -3.429 0.000605 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 808.59 on 587 degrees of freedom
## Residual deviance: 648.74 on 582 degrees of freedom
## AIC: 660.74
##
## Number of Fisher Scoring iterations: 4
Let’s now use
dplyrto get data frames for by-items and by-subjects analyses. These should contain means for age and condition across items.
df1 = d %>%
group_by(subid, age.group, condition) %>%
summarize(
mean = mean(correct)
)
df2 = d %>%
group_by(age.group, condition, trial) %>%
summarize(
mean = mean(correct)
)
print(df1)
## Source: local data frame [147 x 4]
## Groups: subid, age.group [?]
##
## subid age.group condition mean
## (fctr) (fctr) (fctr) (dbl)
## 1 C1 4 Label 1.00
## 2 C10 3 Label 0.50
## 3 C11 4 Label 0.75
## 4 C12 3 Label 0.75
## 5 C13 4 Label 0.50
## 6 C14 4 Label 0.75
## 7 C15 3 Label 1.00
## 8 C16 3 Label 0.50
## 9 C17 3 Label 0.50
## 10 C18 4 Label 0.75
## .. ... ... ... ...
print(df2)
## Source: local data frame [24 x 4]
## Groups: age.group, condition [?]
##
## age.group condition trial mean
## (fctr) (fctr) (fctr) (dbl)
## 1 2 Label beds 0.6000000
## 2 2 Label faces 0.6400000
## 3 2 Label houses 0.4800000
## 4 2 Label pasta 0.5600000
## 5 2 No Label beds 0.2916667
## 6 2 No Label faces 0.2500000
## 7 2 No Label houses 0.1666667
## 8 2 No Label pasta 0.2500000
## 9 3 Label beds 0.9230769
## 10 3 Label faces 0.6153846
## .. ... ... ... ...
Now do standard linear models on each of these.
NOTE: These are not strictly correct either because of the normal approximation on percent correct (model doesn’t know it’s 0 - 1 bounded and could give you standard error that goes above 1). Again, useful to do and see what happens.
lm1 = lm(mean ~ age.group * condition + as.numeric(subid), data = df1)
summary(lm1)
##
## Call:
## lm(formula = mean ~ age.group * condition + as.numeric(subid),
## data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56567 -0.21567 0.00747 0.18519 0.51100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5914309 0.0856187 6.908 1.59e-10 ***
## age.group3 0.1273166 0.0788618 1.614 0.1087
## age.group4 0.1847851 0.0837998 2.205 0.0291 *
## conditionNo Label -0.3377500 0.0681355 -4.957 2.04e-06 ***
## as.numeric(subid) -0.0001922 0.0006570 -0.293 0.7703
## age.group3:conditionNo Label -0.1243128 0.1068584 -1.163 0.2467
## age.group4:conditionNo Label -0.2930486 0.1181286 -2.481 0.0143 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2217 on 140 degrees of freedom
## Multiple R-squared: 0.574, Adjusted R-squared: 0.5557
## F-statistic: 31.44 on 6 and 140 DF, p-value: < 2.2e-16
lm2 = lm(mean ~ age.group * condition + trial, data = df2)
summary(lm2)
##
## Call:
## lm(formula = mean ~ age.group * condition + trial, data = df2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.17567 -0.03765 -0.00554 0.03459 0.17873
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.63109 0.05758 10.960 1.48e-08 ***
## age.group3 0.14154 0.06649 2.129 0.050256 .
## age.group4 0.20083 0.06649 3.021 0.008605 **
## conditionNo Label -0.33042 0.06649 -4.969 0.000168 ***
## trialfaces -0.08628 0.05429 -1.589 0.132837
## trialhouses -0.14607 0.05429 -2.691 0.016772 *
## trialpasta -0.01201 0.05429 -0.221 0.827920
## age.group3:conditionNo Label -0.14154 0.09403 -1.505 0.153025
## age.group4:conditionNo Label -0.31542 0.09403 -3.354 0.004346 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09403 on 15 degrees of freedom
## Multiple R-squared: 0.9236, Adjusted R-squared: 0.8828
## F-statistic: 22.65 on 8 and 15 DF, p-value: 4.833e-07
Do ANOVA on these. Note that ANOVA doesn’t let you figure out what is going on with respect to age.
# Using glm instead.
glm1 = glm(correct ~ age.group * condition + as.numeric(subid), data = d, family = 'binomial')
glm2 = glm(correct ~ age.group * condition + trial, data = d, family = 'binomial')
anova(glm1, glm2)
## Analysis of Deviance Table
##
## Model 1: correct ~ age.group * condition + as.numeric(subid)
## Model 2: correct ~ age.group * condition + trial
## Resid. Df Resid. Dev Df Deviance
## 1 581 648.66
## 2 579 637.00 2 11.663
As we can see, model 2 (glm2) fits better.
On to linear mixed effect models. Create the maximal random effects model a la Barr et al. (2013). Does it converge? If not, what will you do to make it converge? (The internet can be your friend here).
HINT: make sure that you consider which random effects are appropriate. Consider which observations are within/between subjects. E.g. having a random coefficient for age by subject doesn’t make sense, because each subject has only one age.
mm1 = glmer(correct ~ age.group * condition + (1 | subid) + (1 | trial), data = d, family = 'binomial')
summary(mm1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ age.group * condition + (1 | subid) + (1 | trial)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 661.2 696.2 -322.6 645.2 580
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0905 -0.6050 -0.3480 0.6322 2.7987
##
## Random effects:
## Groups Name Variance Std.Dev.
## subid (Intercept) 0.05389 0.2321
## trial (Intercept) 0.07494 0.2737
## Number of obs: 588, groups: subid, 147; trial, 4
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2905 0.2513 1.156 0.247703
## age.group3 0.6393 0.3085 2.072 0.038261 *
## age.group4 0.9571 0.3292 2.907 0.003646 **
## conditionNo Label -1.4786 0.3300 -4.481 7.43e-06 ***
## age.group3:conditionNo Label -0.6396 0.4655 -1.374 0.169438
## age.group4:conditionNo Label -1.7623 0.5195 -3.392 0.000693 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ag.gr3 ag.gr4 cndtNL a.3:NL
## age.group3 -0.566
## age.group4 -0.528 0.446
## conditnNLbl -0.543 0.417 0.384
## ag.grp3:cNL 0.375 -0.663 -0.296 -0.671
## ag.grp4:cNL 0.333 -0.287 -0.639 -0.590 0.441
The model converges! Both condition and age group have a significant positive effect (p < .001, and p = .038 and p = .0036 respectively), and there is a significant interaction with a negative coefficient between condition and age group 4 (p < .001).
How do these coefficients compare with the independent coefficients linear model? What do you conclude?
To recall, here is what model 2 looked like:
summary(glm2)
##
## Call:
## glm(formula = correct ~ age.group * condition + trial, family = "binomial",
## data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8845 -0.8221 -0.4751 0.8659 2.1155
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.63585 0.26690 2.382 0.01720 *
## age.group3 0.63728 0.29999 2.124 0.03364 *
## age.group4 0.95435 0.31987 2.984 0.00285 **
## conditionNo Label -1.47274 0.31722 -4.643 3.44e-06 ***
## trialfaces -0.48536 0.27463 -1.767 0.07717 .
## trialhouses -0.82902 0.27845 -2.977 0.00291 **
## trialpasta -0.07442 0.27284 -0.273 0.78502
## age.group3:conditionNo Label -0.63728 0.45443 -1.402 0.16080
## age.group4:conditionNo Label -1.75693 0.50702 -3.465 0.00053 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 808.59 on 587 degrees of freedom
## Residual deviance: 637.00 on 579 degrees of freedom
## AIC: 655
##
## Number of Fisher Scoring iterations: 4
The slope coefficients of the mixed model are almost the same as those of the general linear model, with only a slight change in the intercept. In addition the results of our mixed model summary show us that very little of the variance is accounted for by random effects of subid and trial. Thus we can conclude that the fixed effects linear model fits these data fine, and we don’t need to account for random subid and trial effects.
Which random effects make the most difference? Find out using
ranef. Plot the random effects for subject and item.
re = ranef(mm1)
qqnorm(re$subid[,1])
qqline(re$subid[,1])
qqnorm(re$trial[,1])
qqline(re$trial[,1])
From the plots of the random effects we can see that the random effect of tria makes a bigger difference than the random effect of subid.
Make the minimal random effects model with just a subject intecept. How does this compare?
mm2 = glmer(correct ~ age.group * condition + (1 | subid), data = d, family = 'binomial')
summary(mm2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ age.group * condition + (1 | subid)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 662.7 693.4 -324.4 648.7 581
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8385 -0.5597 -0.3737 0.6337 2.6414
##
## Random effects:
## Groups Name Variance Std.Dev.
## subid (Intercept) 0.02648 0.1627
## Number of obs: 588, groups: subid, 147
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2837 0.2056 1.380 0.167603
## age.group3 0.6248 0.3014 2.073 0.038190 *
## age.group4 0.9365 0.3220 2.908 0.003634 **
## conditionNo Label -1.4456 0.3223 -4.485 7.3e-06 ***
## age.group3:conditionNo Label -0.6249 0.4561 -1.370 0.170665
## age.group4:conditionNo Label -1.7305 0.5104 -3.390 0.000699 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ag.gr3 ag.gr4 cndtNL a.3:NL
## age.group3 -0.675
## age.group4 -0.630 0.443
## conditnNLbl -0.646 0.419 0.387
## ag.grp3:cNL 0.446 -0.661 -0.293 -0.675
## ag.grp4:cNL 0.395 -0.282 -0.635 -0.593 0.438
The coefficients look almost the same as in the maximal random effects model.
Get significance values for the coefficient on the age*condition interaction using the anova, likelihood-ratio test approach, comparing between your semi-maximal model and one without an intercept.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# No-intercept model
glm3 = glm(correct ~ age.group * condition - 1, data = d, family = 'binomial')
summary(glm3)
##
## Call:
## glm(formula = correct ~ age.group * condition - 1, family = "binomial",
## data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7166 -0.7401 -0.5168 0.8250 2.0393
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## age.group2 0.2819 0.2020 1.395 0.162902
## age.group3 0.9029 0.2164 4.171 3.03e-05 ***
## age.group4 1.2130 0.2428 4.995 5.87e-07 ***
## conditionNo Label -1.4368 0.3130 -4.590 4.43e-06 ***
## age.group3:conditionNo Label -0.6210 0.4494 -1.382 0.167050
## age.group4:conditionNo Label -1.7221 0.5022 -3.429 0.000605 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 815.14 on 588 degrees of freedom
## Residual deviance: 648.74 on 582 degrees of freedom
## AIC: 660.74
##
## Number of Fisher Scoring iterations: 4
lrtest(glm3, mm1)
## Likelihood ratio test
##
## Model 1: correct ~ age.group * condition - 1
## Model 2: correct ~ age.group * condition + (1 | subid) + (1 | trial)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 6 -324.37
## 2 8 -322.62 2 3.5047 0.1734
We see that modeling subid and trial as random effects do not significantly improve the model ( \(\chi^2\)(2) = 3.5, p = .173), therefore we are happy with the original fixed effects model.