This is problem set #4, in which we want you to integrate your knowledge of data wrangling with some basic simulation skills. It’s a short problem set to help consolidate your ggplot2 skills and then help you get your feet wet in testing statistical concepts through “making up data” rather than consulting a textbook or doing math.
For ease of reading, please separate your answers from our text by marking our text with the > character (indicating quotes).
This part is a warmup, it should be relatively straightforward ggplot2 practice.
Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (Faces_Medium, in which kids played on a white background, and Faces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). An eye-tracker measured children’s attention to faces. This version of the dataset only gives two conditions and only shows the amount of looking at hands (other variables were measured as well).
library(tidyverse)
theme_set(theme_minimal())
fvs <- read_csv("data/FVS2011-hands.csv")
## Parsed with column specification:
## cols(
## subid = col_double(),
## age = col_double(),
## condition = col_character(),
## hand.look = col_double()
## )
First, use ggplot to plot a histogram of the ages of children in the study. NOTE: this is a repeated measures design, so you can’t just take a histogram of every measurement.
fvs %>%
group_by(subid) %>%
summarize(age = age[1]) %>%
ggplot(aes(x = age)) +
geom_histogram(binwidth = 1) +
scale_x_continuous(breaks = seq(1, max(fvs$age))) +
labs(x = "Age", y = "Frequency")
Second, make a scatter plot showing hand looking as a function of age and condition. Add appropriate smoothing lines. Take the time to fix the axis labels and make the plot look nice.
fvs %>%
ggplot(aes(x = age, y = hand.look, color = condition)) +
geom_point(alpha = .3) +
geom_smooth(se=F, size = .2) +
geom_smooth(method = "lm", linetype="dotted") +
scale_x_continuous(breaks = seq(1, max(fvs$age))) +
labs(x = "Age", y = "Hand looking time", color = "Condition")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Looking at difference between conditions within subjects
fvs %>%
ggplot(aes(x = age, y = hand.look, color = factor(subid))) +
geom_line() +
theme(legend.position = "none") +
scale_x_continuous(breaks = seq(1, max(fvs$age))) +
labs(x = "Age", y = "Hand looking time", color = "Condition")
#
# lmer(hand.look ~ age*condition + (1|subid), fvs) %>%
# summary
What do you conclude from this pattern of data?
As children get older, they look at hands more. This effect is more pronounced in complex scenes.
What statistical analyses would you perform here to quantify these differences?
I model the interaction of age and condition with the subject ID as a random intercept. There are not enough data to include any random slopes.
Let’s start by convincing ourselves that t-tests have the appropriate false positive rate. Run 10,000 t-tests with standard, normally-distributed data from a made up 30-person, single-measurement experiment (the command for sampling from a normal distribution is rnorm).
The goal of these t-tests are to determine, based on 30 observations, whether the underlying distribution (in this case a normal distribution with mean 0 and standard deviation 1) has a mean that is different from 0. In reality, the mean is not different from 0 (we sampled it using rnorm), but sometimes the 30 observations we get in our experiment will suggest that the mean is higher or lower. In this case, we’ll get a “significant” result and incorrectly reject the null hypothesis of mean 0.
What’s the proportion of “significant” results (\(p < .05\)) that you see?
Setting constants and useful functions
REPS <- 10000
SAMP_SZ <- 30
prop_positive <- function(x, alpha = .05){
sum(x < (alpha/2) | x > (1-(alpha/2)))/length(x)
}
First do this using a for loop.
p_vals <- numeric(REPS)
t1 <- Sys.time()
for(i in 1:length(p_vals)){
data <- rnorm(SAMP_SZ)
p_vals[i] <- t.test(data)$p.value
}
t2 <- Sys.time()
t2 - t1
## Time difference of 0.675077 secs
prop_positive(p_vals)
## [1] 0.0502
Next, do this using the replicate function:
t1 <- Sys.time()
p_vals <- replicate(REPS, t.test(rnorm(SAMP_SZ))$p.value)
t2 <- Sys.time()
t2 - t1
## Time difference of 0.920558 secs
prop_positive(p_vals)
## [1] 0.0512
How does this compare to the intended false-positive rate of \(\alpha=0.05\)?
It looks approximately correct.
Ok, that was a bit boring. Let’s try something more interesting - let’s implement a p-value sniffing simulation, in the style of Simons, Nelson, & Simonsohn (2011).
Consider this scenario: you have done an experiment, again with 30 participants (one observation each, just for simplicity). The question is whether the true mean is different from 0. You aren’t going to check the p-value every trial, but let’s say you run 30 - then if the p-value is within the range p < .25 and p > .05, you optionally run 30 more and add those data, then test again. But if the original p value is < .05, you call it a day, and if the original is > .25, you also stop.
First, write a function that implements this sampling regime.
# original recursive version
# too memory inefficient for the very expensive simulations later on :(
# double.sample <- function(dat = numeric(0),
# samp_sz = SAMP_SZ,
# alpha = .05,
# upper_threshold = .25){
# dat <- c(dat, rnorm(samp_sz))
# p_val <- t.test(dat)$p.value
# if(p_val < alpha | p_val > upper_threshold){
# return(p_val)
# } else(
# double.sample(dat, samp_sz, alpha, upper_threshold)
# )
# }
#' double.sample
#'
#' @param samp_sz size of sample that gets added after each test
#' @param block_sz size of block to tack on when adding more participants
#' @param alpha alpha threshold for significance
#' @param upper_threshold upper threshold for when to stop adding participants
#' @param killswitch how many participants to collect before abandoning experiment
#'
#' @return a list with the data as the first element and the p value as the second
double.sample <- function(samp_sz = SAMP_SZ,
block_sz = SAMP_SZ,
alpha = .05,
upper_threshold = .25,
killswitch = Inf){
dat <- rnorm(samp_sz)
p_val <- t.test(dat)$p.value
while(p_val > alpha && p_val < upper_threshold){
dat <- c(dat, rnorm(block_sz))
p_val <- t.test(dat)$p.value
if(killswitch <= length(dat)) break
}
return(list(data=dat, pvalue=p_val))
}
Now call this function 10k times and find out what happens.
# false positive rate for optional stopping with 30 subs
ss30 <- replicate(REPS, list(double.sample()))
ss30 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive
## [1] 0.0706
# what does the distribution of sample sizes look like
ss30 %>%
map(~length(.$data)) %>%
unlist %>%
table
## .
## 30 60 90 120 150 180 210 240 270 300 360 390 420 480 510
## 8018 1328 368 140 70 34 12 12 9 2 2 2 1 1 1
# Highly suspect p-curve
ss30 %>%
map(~.$pvalue) %>%
unlist %>%
data.frame(pvalue=.) %>%
ggplot(aes(x = pvalue, y = ..count..)) +
stat_bin(geom="line", binwidth = .01) +
scale_x_continuous(breaks=seq(0,1,.05))
# optional stopping with additions of 2 participants at a time
ss30_b2 <- replicate(REPS, list(double.sample(block_sz = 2)))
ss30_b2 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive
## [1] 0.0541
# distribution of sample sizes
ss30_b2 %>%
map(~length(.$data)) %>%
unlist %>%
table
## .
## 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
## 8030 548 334 245 182 141 103 92 57 51 42 29 23 21 17
## 60 62 64 66 68 70 72 74 76 78 80 82 84 86 90
## 14 9 11 5 8 6 1 4 5 2 3 2 2 2 2
## 94 96 98 100 108 112 124
## 1 2 2 1 1 1 1
# optional stopping with a reduced significance threshold
ss30_b5_a005 <- replicate(REPS, list(double.sample(samp_sz = 30, block_sz = 5, alpha = .005)))
ss30_b5_a005 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive(alpha = .05)
## [1] 0.0527
ss30_b5_a005 %>%
map(~.$pvalue) %>%
unlist %>%
data.frame(pvalue=.) %>%
ggplot(aes(x = pvalue, y = ..count..)) +
stat_bin(geom="line", binwidth = .01) +
scale_x_continuous(breaks=seq(0,1,.05))
# distribution of sample sizes
ss30_b5_a005 %>%
map(~length(.$data)) %>%
unlist %>%
table
## .
## 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
## 7465 691 398 278 175 147 123 85 67 46 51 40 47 34 29
## 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175
## 23 16 20 17 17 15 10 13 13 7 6 13 10 7 10
## 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250
## 6 2 9 2 3 3 1 1 6 2 6 1 5 2 2
## 255 260 265 270 275 285 290 295 300 305 310 315 325 330 335
## 1 2 5 2 4 2 1 3 2 1 2 2 1 1 2
## 340 345 350 360 365 375 380 395 400 405 410 435 440 445 450
## 5 2 2 2 1 3 1 1 1 1 2 1 1 1 1
## 455 470 520 530 540 600 615 635 650 785 805 1075 1190 1240 2300
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2495 2600 3295 4485 6135
## 1 1 1 1 1
Is there an inflation of false positives? How bad is it?
If you run participants in blocks of 30, you increase your false positive rate by about 1.5%. If you run participants in blocks of 2, you increase your false positive rate by about 2.5%. However, if you lower your significance threshold to .005, it appears to balance out to a false positive rate of .05 :)
Now modify this code so that you can investigate this “double the sample” rule in a bit more depth. In the previous question, the researcher doubles the sample only when they think they got “close” to a significant result, i.e. when their not-significant p is less than 0.25. What if the researcher was more optimistic? See what happens in these 3 other scenarios:
How do these choices affect the false positive rate?
HINT: Try to do this by making the function double.sample take the upper p value as an argument, so that you can pass this through dplyr.
HINT 2: You may need more samples. Find out by looking at how the results change from run to run.
u5 <- replicate(REPS*5, list(double.sample(upper_threshold = .5)))
u5 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive()
## [1] 0.09022
u75 <- replicate(REPS*5, list(double.sample(upper_threshold = .75)))
u75 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive
## [1] 0.13864
u75 %>%
map(~length(.$data)) %>%
unlist %>%
table
## .
## 30 60 90 120 150 180 210 240 270 300 330 360
## 14951 9830 6235 4142 2859 2005 1565 1152 889 727 628 503
## 390 420 450 480 510 540 570 600 630 660 690 720
## 407 339 318 263 247 213 192 171 137 131 111 114
## 750 780 810 840 870 900 930 960 990 1020 1050 1080
## 100 92 76 76 70 74 46 55 50 51 48 36
## 1110 1140 1170 1200 1230 1260 1290 1320 1350 1380 1410 1440
## 53 37 32 35 27 33 37 28 25 20 20 24
## 1470 1500 1530 1560 1590 1620 1650 1680 1710 1740 1770 1800
## 19 16 21 19 13 22 25 14 12 17 14 16
## 1830 1860 1890 1920 1950 1980 2010 2040 2070 2100 2130 2160
## 15 9 9 10 11 16 5 16 8 9 9 5
## 2190 2220 2250 2280 2310 2340 2370 2400 2430 2460 2490 2520
## 8 9 11 8 7 11 7 4 7 6 6 3
## 2550 2580 2610 2640 2670 2700 2730 2760 2790 2820 2850 2880
## 3 3 4 5 3 6 6 1 4 1 7 5
## 2910 2940 2970 3000 3030 3060 3090 3120 3150 3180 3210 3240
## 4 7 5 3 3 3 4 1 3 4 1 1
## 3270 3300 3330 3360 3390 3420 3450 3480 3510 3540 3570 3600
## 1 1 5 5 1 3 6 2 2 2 2 4
## 3630 3660 3690 3720 3750 3780 3810 3840 3870 3900 3930 3960
## 4 4 2 1 2 2 1 3 3 3 2 3
## 3990 4020 4050 4080 4110 4170 4200 4230 4260 4320 4350 4380
## 1 2 2 1 1 1 2 2 2 2 1 1
## 4410 4440 4470 4530 4590 4620 4680 4710 4740 4770 4830 4860
## 1 3 2 3 2 1 1 2 4 1 3 3
## 4920 4980 5070 5100 5130 5190 5220 5280 5310 5340 5370 5400
## 2 3 3 1 3 1 1 1 1 1 1 1
## 5430 5460 5550 5580 5640 5670 5700 5730 5760 5790 5820 6120
## 2 1 2 1 1 1 1 1 1 1 3 2
## 6150 6420 6570 6720 6780 6870 6930 6990 7050 7320 7350 7470
## 1 2 1 1 1 2 1 1 1 1 2 1
## 7560 7590 7740 7770 7860 7950 8160 8190 8640 8700 9000 9420
## 1 1 1 1 1 1 1 1 1 2 1 1
## 9480 9660 9690 9720 9750 10020 10290 10320 10380 10410 10440 10500
## 1 2 1 1 1 1 1 1 1 1 2 1
## 10560 10800 10890 11040 11070 11160 11310 11610 11850 11880 11970 13260
## 1 1 1 1 1 1 1 1 1 1 1 1
## 13680 15030 16020 17700 19020 19290 19500 19590 20010 20640 22470 26370
## 1 1 1 1 1 1 1 1 1 1 1 1
## 30360 31110 32280 35550 35910 43320 53820 55140 60480 74520 75150 85050
## 1 1 1 1 1 1 1 1 1 1 1 1
u75 %>%
map(~length(.$data)) %>%
unlist %>%
qplot(binwidth = 10) +
scale_x_continuous(limits = c(0,1000))
## Warning: Removed 1232 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
# This simulation is too computationally expensive to ever finish so I added a "killswitch"
# i.e. simulation will stop collecting participants if it exceeds that number
u1_k1000 <- replicate(REPS*3, list(double.sample(upper_threshold = 1, killswitch = 1000)))
u1_k1000 %>%
map(~.$pvalue) %>%
unlist %>%
prop_positive
## [1] 0.0993
# resulting sample sizes
u1_k1000 %>%
map(~length(.$data)) %>%
unlist %>%
table
## .
## 30 60 90 120 150 180 210 240 270 300 330 360
## 1493 1018 712 572 482 365 350 297 267 243 232 224
## 390 420 450 480 510 540 570 600 630 660 690 720
## 187 188 169 151 142 125 139 121 107 103 99 98
## 750 780 810 840 870 900 930 960 990 1020
## 102 88 88 84 89 73 73 87 67 21365
# another very sketchy p curve
u1_k1000 %>%
map(~.$pvalue) %>%
unlist %>%
data.frame(pvalue=.) %>%
ggplot(aes(x = pvalue, y = ..count..)) +
stat_bin(geom="line", binwidth = .01) +
scale_x_continuous(breaks=seq(0,1,.05))
What do you conclude on the basis of this simulation? How bad is this kind of data-dependent policy?
Optional stopping increases your false positive rate. The effect is particularly pronounced as you increase your upper threshold for calling it quits. Although, if you abide by an upper threshold of .25 and lower your alpha to .005, it seems to balance out to a false positive rate of about .05.