The problem: Find the minimum sample size needed to be 99% confident that the sample’s variance is within 1% of the population’s variance.
This question is exceedingly tricky, since 1) \(\chi^2\) is not symmetrical, but we can at least find one interval that meets the mission. But remember, the sample variance is the best estimate of the population variance, so we have some cancellation! The confidence interval follows
\[ \frac{(n-1)s^2}{\chi^2_{df. \alpha/2}}<\sigma^2<\frac{(n-1)s^2}{\chi^2_{df. 1-\alpha/2}} \] In our case, we can divide through by \(\sigma^2\) to have the following.
\[ \frac{(n-1)}{\chi^2_{df. \alpha/2}}<1<\frac{(n-1)}{\chi^2_{df. 1-\alpha/2}} \] So we need this interval to within 1% of the population’s variance. So the upper bound minus the lower bound must be within \(0.01\times \sigma^2=0.01 \times 1=0.01\).
\[ \frac{(n-1)}{\chi^2_{df. \alpha/2}}-\frac{(n-1)}{\chi^2_{df. 1-\alpha/2}}=.01 \] Since both the numerator and the \(\chi^2\) random variable are dependent on the degrees of freedom (df=n-1), we have two moving parts at once. We could just use a table or some quick optimization, right? Or we could use brute force estimation.
Here, we successively iterate our bounds until we find numbers appearing. (I changed both the sequence interval and the step to do so.) Solutions for within 0.01 (what was asked in the problem), 0.10, and 1.00 are provided.
# You can verify the bottom two with a table or
#https://stats.stackexchange.com/questions/7004/calculating-required-sample-size-precision-of-variance-estimate
#Within 1% is 530805. Correct Answer
for (i in seq(530800, 530806,by=1)){
t1=(i-1)/qchisq(.005,df=i-1, lower.tail=F)
t2=(i-1)/qchisq(.995, df=i-1, lower.tail =F)
ifelse(t2-t1<.01, print(i), print(''))
}
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] 530805
## [1] 530806
#Within 10% is 5321
for (i in seq(5315, 5325,by=1)){
t1=(i-1)/qchisq(.005,df=i-1, lower.tail=F)
t2=(i-1)/qchisq(.995, df=i-1, lower.tail =F)
ifelse(t2-t1<.1, print(i), print(''))
}
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] 5321
## [1] 5322
## [1] 5323
## [1] 5324
## [1] 5325
#Within 100% is 65
for (i in seq(60,66,by=1)){
t1=(i-1)/qchisq(.005,df=i-1, lower.tail=F)
t2=(i-1)/qchisq(.995, df=i-1, lower.tail =F)
ifelse(t2-t1<1, print(i), print(''))
}
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] ""
## [1] 65
## [1] 66
We can find the root function as follows.
myf=function(i)(i-1)/qchisq(.995, df=i-1, lower.tail =F)-(i-1)/qchisq(.005,df=i-1, lower.tail=F)-.01
ceiling(uniroot(myf, c(2, 531000), extendInt='yes', maxiter=100000)$root)
## [1] 530805
myf=function(i)(i-1)/qchisq(.995, df=i-1, lower.tail =F)-(i-1)/qchisq(.005,df=i-1, lower.tail=F)-.1
ceiling(uniroot(myf, c(2,5000), extendInt='yes', maxiter=100000)$root)
## [1] 5321
myf=function(i)(i-1)/qchisq(.995, df=i-1, lower.tail =F)-(i-1)/qchisq(.005,df=i-1, lower.tail=F)-1
ceiling(uniroot(myf, c(2,100), extendInt='yes', maxiter=100000)$root)
## [1] 65