Question 0. Ask a clarifying question about Day 2

Question 1. Power for interactions

Imagine that in Study 1 you test a simple effect of show ad: yes vs no, with 50 per cell (N=100 total). Now, in Study 2 you want to test an interaction with whether the ad is in English vs French. You think that the effect in French is only 1/3 as big as in English, and that is indeed your central prediction for this study, so you want to be properly powered to detect that interaction. What sample size do you need per cell in Study 2 so that the interaction has the same power as the simple effect did in Study 1?

Hint: in class we used simulation to validate the general prescription to double the sample size per cell for full attenuation, you can tweak that simulation through trial and error to figure out when you get same power when rather than fully attenuating, the effect drops ‘only’ by 2/3 (instead of dropping by 3/3).

#Study 1 simulation
set.seed(123);rm(list = ls())
n = 100
#Lets do ES of 0, .25, .50, .75, and 1
es = seq(0,2,by=.25)
pw = c()
for (i in es){pw = append(pw,pwr::pwr.t.test(n = 50,d =i)$power)}

Now lets simulate study 2.

es2 = es / 3
es

## [1] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

es2

## [1] 0.00000000 0.08333333 0.16666667 0.25000000 0.33333333 0.41666667 0.50000000
## [8] 0.58333333 0.66666667

esi = es[4]
ns = c(50, 75, 100, 125, 150, 175, 200)

p1 = p2 = c()
po1 = po2 = c()
for (n in 1:length(ns)) {
  for (j in 1:length(es)) {
    for (i in 1:1000) {
      x1 = rnorm(n = ns[n],
                 mean = 0,
                 sd = 1)
      x2 = rnorm(n = ns[n],
                 mean = 0 + es[j],
                 sd = 1)
      #French
      x3 = rnorm(n = ns[n],
                 mean = 0,
                 sd = 1)
      x4 = rnorm(n = ns[n],
                 mean = 0 + es[j] / 3,
                 sd = 1)
      data =
        tibble(value = c(x1, x2, x3, x4),
               English = rep(c(1, 0), each = ns[n] * 2),
               Ad = rep(c(0,1,0,1), each = ns[n]))
p1[i] = t.test(x1,x2)$p.value
p2[i] = lm(value~Ad*English,data= data) %>% broom::tidy() %>% dplyr::filter(term == "Ad:English") %>% pull(p.value)}
    po1 = append(po1,mean(p1<.05))
    po2 = append(po2,mean(p2<.05))
  }

}

# Samples change slower

bind_rows(
  po1 %>% as_tibble %>% gather() %>% mutate(test = "t",n = rep(ns, each = length(es)),
                                                               es = rep(es,length((ns)))),
  po2 %>% as_tibble %>% gather() %>% mutate(test = "lm",n = rep(ns, each = length(es)),
                                                               es = rep(es,length((ns)))),
) %>% 
  ggplot(aes(n,value,col = test))+geom_line(aes(group = test))+facet_wrap(~es)+
  geom_hline(aes(yintercept = value), data = enframe(pw) %>% mutate(es = es))+
  Ben::theme_ang()

Question 2. Standard Error

In class we computed via resampling the standard error of the mean for this sample: c(2,2,5,5,10, 15,20,50)

a) What is the standard error for the median?

5.310235. Calculated as the standard deviation of the 1000 bootstrapped medians.

set.seed(123);rm(list = ls())
x = c(2,2,5,5,10, 15,20,50)
nboot = 1000
n = length(x)
md.boot = c()
for (i in 1:nboot){
  x.boot = sample(x, size=n, replace=TRUE)
  md.boot[i] = median(x.boot)
}
md.boot %>% hist()

sd(md.boot)

## [1] 5.310235

b) You will compute the standard error of the max and min below, but before you do, write down if you think they will be about the same, if the SE of the min will be noticeably larger, or if the SE of the max will be noticeably larger.

In our sample, the min 2, has a frequency of two. The max, 50, has a frequency of one. Therefore, think that we will more often have a min of 2 in the bootstrapped datasets, and less often a max of 50 (its half as likely!). As such, I expect the max to have a higher standard error. There will be more variation in it across our bootstrapped samples.

set.seed(123);rm(list = ls())
x = c(2,2,5,5,10, 15,20,50)
nboot = 1000
n = length(x)
min.boot = c()
max.boot = c()
for (i in 1:nboot){
  x.boot = sample(x, size=n, replace=TRUE)
  min.boot[i] = min(x.boot)
  max.boot[i] = max(x.boot)
}
print(paste("The SE of the max is",sd(max.boot)))

## [1] "The SE of the max is 15.2569451193342"

print(paste("The SE of the min is",sd(min.boot)))

## [1] "The SE of the min is 0.913227783734696"

c) What is the standard error for the max?

The SE of the max is 15.2569451193342

d) What is the standard error for the min?

The SE of the min is 0.913227783734696

e) Propose a new vector of 8 values where the opposite relationship between SEs attains (so if in this sample SE(max)<SE(min), then in the sample you propose SE(min)>SE(max)

We just need to reverse the vectors:

set.seed(123);rm(list = ls())
x = c(2,2,5,5,10, 15,20,50)
nboot = 1000
x.rev = 52-x
n = length(x.rev)
min.boot = c()
max.boot = c()
for (i in 1:nboot){
  x.boot = sample(x.rev, size=n, replace=TRUE)
  min.boot[i] = min(x.boot)
  max.boot[i] = max(x.boot)
}
print(paste("Using the vector",paste(x.rev,collapse = ", ")))

## [1] "Using the vector 50, 50, 47, 47, 42, 37, 32, 2"

print(paste("The SE of the max is",sd(max.boot)))

## [1] "The SE of the max is 0.913227783734696"

print(paste("The SE of the min is",sd(min.boot)))

## [1] "The SE of the min is 15.2569451193342"

[1] “Using the vector 50, 50, 47, 47, 42, 37, 32, 2”

[1] “The SE of the max is 0.913227783734696”

[1] “The SE of the min is 15.2569451193342”

Question 3. Are these data fake?

A psychology article runs a 2x2x3 between-subject experiment with 169 participants. The conditions are: Color of folder providing the instructions: Blue, White, Red Type of prime provided: Exemplar (e.g., Einstein) vs Stereotype (e.g., Professor) Dimension of prime: Intelligent (e.g., Professor) vs Unintelligent (e.g., Model)

The researchers then measure performance in a trivial pursuit task and the results come as expected. In the six cells predicted to perform better, participants performed better and in those predicted to do worse they did indeed do worse.

Here is the writeup of the results:

The means seem similar across conditions. Are they too similar? (note, the article does not report all 12 means and SD, you can ballpark the latter using the figure above, and the former assuming the SDs are similar).

set.seed(123);rm(list = ls())
n = 169
data = expand_grid(color = c("blue",'white','red'),type = c('exemplar','stereotype'), dimension = c('intelligent','dumb')) %>% rownames_to_column("id") %>% mutate(M = NA, SD = NA)
#Intelligent Stereotype
data$M[7] = 12.07
data$SD[7] = 2.78
# Dumb Stereotype
data$M[8] = 9.78
data$SD[8] = 2.66
#Dumb Exemplar
data$M[6] = 11.71
data$SD[6] = 2.87
# Intelligent Exemplar
data$M[5] = 9.56
data$SD[5] = 2.83

# Blue
# Intelligent
data$M[c(1,3)]  = 11.93
data$SD[c(1,3)] = 2.98

# Dumb
data$M[c(2,4)]  = 9.53
data$SD[c(2,4)] = 2.87

# Red
# Intelligent
data$M[c(9,11)] =9.25
data$SD[c(9,11)]=2.77
# Dumb
data$M[c(10,12)] =11.59
data$SD[c(10,12)]=2.86

data %>% ggplot(aes(type,M,fill = dimension))+geom_col(position  = "dodge")+facet_wrap(~color)

We have replicated the data as best as possible.

Let’s see if we can expect so small sd differences across groups. Let’s assume the data come from distributions with the same SDs

sdsd.sim = c()
sddata = data %>% 
  mutate(n = c(rep(14, 11), 15)) %>% 
  select(M,SD,n) %>% 
  group_by(M,SD) %>% 
  summarise(n = sum(n))

## `summarise()` has grouped output by 'M'. You can override using the `.groups` argument.

sdsd = sddata$SD %>% sd
for (i in 1:10000) {
  sdsd.sim[i] =
    sddata %>%
    mutate(globalsd = mean(SD)) %>%
    mutate(replicas = pmap(list(M, globalsd, n), function(x, y, n) {
      rnorm(n = n, mean = x, sd = y)
    }),
    sd = map_dbl(replicas, sd)) %>%
    pull(sd) %>%
    sd()
}

Lets see our results. They suggest the SDs are too similar across groups. Doesn’t happen in 1000 cases (p < .001)! Happens 2 times in 10,000 (p < .0002).

sdsd.sim %>% enframe %>% ggplot(aes(value))+geom_histogram()+geom_vline(xintercept = sdsd)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

sum(sdsd.sim<sdsd)

## [1] 2

What about the means across conditions being to similar.

Let’s talk about the means. Let’s assume that all similar means across conditions are actually identical.

data = read_csv("2.1 Prepare for 3/data with different means.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   id = col_double(),
##   color = col_character(),
##   type = col_character(),
##   dimension = col_character(),
##   M = col_double(),
##   SD = col_double()
## )

gm.color = data %>% group_by(color) %>% summarise(M = mean(M)) %>% pull(M)
gm.type = data %>% group_by(type) %>% summarise(M = mean(M))%>% pull(M)
gm.dim = data %>% group_by(dimension) %>% summarise(M = mean(M))%>% pull(M)

Color

It has three groups, so lets measure similarity as sd(means across conditions).

sd.obs.color = sd(gm.color) # Observed SDs of means
m.sim.color = mean(gm.color) #Overall mean, lets assume everyone has the same mean,
sd.sim.color = c()
for (i in 1:1000){
  x1 = rnorm(56,mean = m.sim.color,sd = mean(data$SD))
  x2 = rnorm(56,mean = m.sim.color,sd = mean(data$SD))
  x3 = rnorm(57,mean = m.sim.color,sd = mean(data$SD))
  sd.sim.color[i] = sd(c(mean(x1),mean(x2),mean(x3)))
}
mean(sd.sim.color < sd.obs.color) #It is not uncommon to observe means that similar

## [1] 0.251

sd.sim.color %>% enframe %>% ggplot(aes(value))+geom_histogram()+geom_vline(xintercept = sd.obs.color)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Type

It has two groups, so lets measure similarity as sd(means across conditions) (We could also use the differnece).

sd.obs.type = sd(gm.type) # Observed SDs of means
m.sim.type = mean(gm.type) #Overall mean, lets assume everyone has the same mean,
sd.sim.type = c()
for (i in 1:1000){
  x1 = rnorm(84,mean = m.sim.type,sd = mean(data$SD))
  x2 = rnorm(85,mean = m.sim.type,sd = mean(data$SD))
  sd.sim.type[i] = sd(c(mean(x1),mean(x2)))
}
mean(sd.sim.type < sd.obs.type) #It is uncommon to observe means that similar (.038 chance)

## [1] 0.04

sd.sim.type %>% enframe %>% ggplot(aes(value))+geom_histogram()+geom_vline(xintercept = sd.obs.type,col = "red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Dimension (dim)

It has three groups, so lets measure similarity as sd(means across conditions).

sd.obs.dim = sd(gm.dim) # Observed SDs of means
m.sim.dim = mean(gm.dim) #Overall mean, lets assume everyone has the same mean,
sd.sim.dim = c()
for (i in 1:1000){
  x1 = rnorm(84,mean = m.sim.dim,sd = mean(data$SD))
  x2 = rnorm(85,mean = m.sim.dim,sd = mean(data$SD))
  sd.sim.dim[i] = sd(c(mean(x1),mean(x2),mean(x3)))
}
mean(sd.sim.dim < sd.obs.dim) #It is uncommon to observe means that similar (.003 chance)

## [1] 0.011

sd.sim.dim %>% enframe %>% ggplot(aes(value))+geom_histogram()+geom_vline(xintercept = sd.obs.dim)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

If I had to guess, I would think the data is fake. Replicating our analysis of sds as in class, the sds of the data are very smaller that would be observed by chance by taking similar samples of equal populations.

Moreover, the similarity of the means across conditions is also telling. Differences of the means across color don’t seem that unlikely (p = 0.251), but that is not the case for the other two independent variables. For both type (p = 0.04) and dimension (p = 0.011) the similarities are unlikely.

Even if the data were not fake, I still wouldn’t believe the results. An three way interaction with 169 people (might be even less, looking at the degrees of freedom in the anovas), would be tremendously underpowered!

OID 937 - Homework 2.

Benjamin Lira

9/19/2021 (Fall 2021)