The most question that statisticians are receiving from researchers is how many sample size will be appropriate for the experiment. At the same time, finding the right sample size is one of the most challenging tasks for statisticians. When the sample size is too small, result would be dubious. If there are two patients are treated with drug A and two are treated with B, although we observe drug A is 50 % better than drug B, it is hard to conclude that two drugs are having the difference since it can be happened by chance.
If we have 4000 patients who are participated in the experiment, we can say for sure that the two drug are having different efficacy. However, if the difference between A and B are only 5% then, statistically it allows rejection to the hypothesis with pvalue <0.05, but in the real experiment 5% is not a big effect. Therefore, we should be able to choose the appropriate sample size which enables us to detect the minimum difference that clinical trial requires.
We want both Type I and II error to be small, which means we want a small significance level and larger power.
library(pwr)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
delta <- 0.07
sigma <- sqrt(0.23)
d <- delta/sigma
b<- pwr.t.test(d=d, sig.level=0.05, power=0.90, type = c('two.sample', 'one.sample', 'paired'))
plot(b)## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
this <- data1 %>%
group_by(Date) %>%
summarise(var = var(value))
this %>%
summarise(var=mean(var))## # A tibble: 1 x 1
## var
## <dbl>
## 1 6.68
data1 %>%
group_by(Tank) %>%
summarise(mean = mean(value))## # A tibble: 3 x 2
## Tank mean
## <chr> <dbl>
## 1 RTB1 10.6
## 2 RTB2 10.4
## 3 RTB4 9.82
delta <- 0.15
sigma <- sqrt(4.205775)delta <- 0.07
sigma <- 0.4795832
d <- delta/sigma
c<- pwr.t2n.test(n1 = 40, n2= 45, d = d, sig.level = 0.1, power = NULL,alternative = "two.sided")
plot(c)Typically we want 80% of power. Power represents our ability to reject the null hypothosesis when it is false. 80% of the time we do this correctly. Converse of this, 20% of the time we risk not rejecting the null when we really should be rejecting the null when we should be rejecting the null.
k: number of groups n: sample size in each group f = effect size, degree to null is false size of the difference between your null hypothesis and the alternative hypothesis that you hope to detect. ex) dog shampoo, 25% shinier than other product, 25 is effect size ex) biology, 25% more autism in one group in order to have a high chance of seeing a significant difference
library(pwr)
cohen.ES(test = "anov", size = "small")##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = small
## effect.size = 0.1
cohen.ES(test = "anov", size = "medium")##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = medium
## effect.size = 0.25
cohen.ES(test = "anov", size = "large")##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = large
## effect.size = 0.4
a <-pwr.anova.test(k = 2, f = 0.4, sig.level = 0.05, power = 0.8)
plot(a)ptab<-cbind(NULL, NULL) # initalize ptab
for (i in c(0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2)){
pwrt<-pwr.t.test(d=i,power=.8,sig.level=.05,type="two.sample",alternative="two.sided")
ptab<-rbind(ptab, cbind(pwrt$d, pwrt$n))
}
par(bg = "Corn Silk")
plot(ptab[,1],ptab[,2],type="b",xlab="Effect Size",ylab="Sample Size", font=2,lwd=2,col="Dark Olive Green")
abline(h=seq(0,500,50), v=seq(0,1.2, 0.1), lty=3, col="gray",pch=1, font=2)sig <- c(0.05,0.1,0.2) #significance level
np <- length(sig)
groupmeans <- c(550, 598, 598, 646)
n <- c(seq(2,10,by=1),seq(12,20,by=2),seq(25,50,by=5)) #sample size
p=list()
for (i in 1:np){
p[[i]] <- power.anova.test(groups = length(groupmeans),
between.var = var(groupmeans), within.var = 6400,
power=NULL, sig.level=sig[i],n=n)
}
par(bg = "grey95")
plot(n,p[[1]]$power*100,ylab="Power = 1-beta",lwd=2,xlab="Sample Size",font=2)
lines(n,p[[1]]$power*100,type="b",col="red",lty=2,lwd=2)
lines(n,p[[2]]$power*100,type="b",col="blue",lty=2,lwd=2)
lines(n,p[[3]]$power*100,type="b",col="black",lty=2,lwd=2)
legend("bottomright",title="significance level",legend=c("0.05", "0.1","0.2"),col=c("red","blue","black"),lwd=2)sample.size.table = function(p=0.5, margin=.05, population) {
z.val=c(0.8416212, 1.036433,1.281551565545, 1.644853626951,1.9599640,
2.326347874041)
ss = (z.val^2 * p * (1-p))/(margin^2)
p.ss = ss/(population-1+ss)*population
c.level = c("60%","70%","80%","90%","95%","98%")
results = data.frame(c.level, round(p.ss, digits = 0))
names(results) = c("Confidence Level", "Sample Size")
METHOD = c("Suggested sample sizes at different confidence levels")
moe = paste((margin*100), "%", sep="")
resp.dist = paste((p*100),"%", sep="")
pre = structure(list(Population=population,
"Margin of error" = moe,
"Response distribution" = resp.dist,
method = METHOD),
class = "power.htest")
print(pre)
print(results)
}
library(kableExtra)
kable(sample.size.table(0.5 ,0.05, 200000))##
## Suggested sample sizes at different confidence levels
##
## Population = 2e+05
## Margin of error = 5%
## Response distribution = 50%
##
## Confidence Level Sample Size
## 1 60% 71
## 2 70% 107
## 3 80% 164
## 4 90% 270
## 5 95% 383
## 6 98% 540
| Confidence Level | Sample Size |
|---|---|
| 60% | 71 |
| 70% | 107 |
| 80% | 164 |
| 90% | 270 |
| 95% | 383 |
| 98% | 540 |
#confidence interval = 0.05,,, range of values that is likely to contain an unknown population parameter.
#confidence level = probability that value falls within a specified range of values, we can be 60% certain#sample size = N/(1+Ne2)
N=200000
e2=0.05^2
sample_size = N/(1+N*e2)
sample_size## [1] 399.2016