7.53 Experiment resizing. At a startup company running a new weather app, an engineering team generally runs experiments where a random sample of 1% of the app’s visitors in the control group and another 1% were in the treatment group to test each new feature. The team’s core goal is to increase a metric called daily visitors, which is essentially the number of visitors to the app each day. They track this metric in each experiment arm and as their core experiment metric. In their most recent experiment, the team tested including a new animation when the app started, and the number of daily visitors in this experiment stabilized at +1.2% with a 95% con_dence interval of (-0.2%, +2.6%). This means if this new app start animation was launched, the team thinks they might lose as many as 0.2% of daily visitors or gain as many as 2.6% more daily visitors. Suppose you are consulting as the team’s data scientist, and after discussing with the team, you and they agree that they should run another experiment that is bigger. You also agree that this new experiment should be able to detect a gain in the daily visitors metric of 1.0% or more with 80% power. Now they turn to you and ask, big of an experiment do we need to run to ensure we can detect this e_ect?"
#Z score for 80% Power
power_80 <- qnorm(.8)
#Z score form alpha of .05(95% confidence)
con_95 <- 1.96
#Calculating SE nessesary to detect 1% difference
temp <- power_80 + con_95
#temp * SE = 1
SE <- 1/temp
#SE needed to detect 1.0% difference with 80% power = .36
#ME = Z * SE
#ME = 2.6 - 1.2 = 1.4
#Z = 1.96
SE_b <- 1.4/1.96
#Standard Error for the first experiment = .71
#To reduce the SE by a certain degree; the denominator must be multplied by that degree squared
#SE = SD/n**(1/2)
#SE/x = SD/n(x**2)**(1/2)
#test
sd<- 1.5
n <- 40
x <- 2.03
#Below two calculations are equal, confirming above
(sd/n**(1/2))/x
## [1] 0.1168329
sd/(n*(x**2))**(1/2)
## [1] 0.1168329
#Following the logic from part(c) we should be increasing the sample size by 2.03^2
2.03**2
## [1] 4.1209
#The initial experiment sampled 1% of the weather app for both the test and control. To be able to detect the 1% difference in means, the recomendation sould be to sample 4.12 percent.