library(gt)
| percent | z_value |
|---|---|
| 99% | 2.58 |
| 95% | 1.96 |
| 90% | 1.65 |
conf_interval <- function(mean,sd,t_value,n,places){
places <- as.numeric(places)
lower <- mean-(t_value*sd/sqrt(n))
upper <- mean+(t_value*sd/sqrt(n))
print(
paste0(
"(",round(lower,places),",",round(upper,places),")"
)
)
}
Only requires count of sample of desired portion (\(Y\)) and sample size (\(n\))
z is for z-value
places for number of places
library(dipsaus)
CIF_unk_pop_mean <- function(Y,n,z,places) {
p_hat <- Y/n
interval <- p_hat%+-%(z*sqrt((p_hat*(1-p_hat))/n))
print(
paste0(
"(",round(min(interval),places),",",round(max(interval),places),")"
)
)
}
samp_size_pop_mean <- function(z,s,SE){
n <- ((z*s)/SE)^2
print(ceiling(n))
}
samp_size_pop_prop <- function(z,p,SE){
print(
ceiling(((z^2)*p*(1-p))/(SE^2))
)
}
\[ \begin{align} \alpha&=1-0.95\\ \frac{\alpha}{2}&=\frac{.05}{2}\\ z_\frac{\alpha}{2}&=z_{.025}\\ P(Z>z_.025)&=.025\\ 1-P(Z<z_.025)&=.025\\ P(Z<z_.025)&=.975 \end{align} \]
qnorm(.975,mean=0,sd=1)
## [1] 1.959964
\[ \begin{align} \alpha&=1-0.90\\ \frac{\alpha}{2}&=\frac{.1}{2}\\ z_\frac{\alpha}{2}&=z_{.05}\\ P(Z>z_.05)&=.05\\ 1-P(Z<z_.05)&=.05\\ P(Z<z_.05)&=.95 \end{align} \]
qnorm(0.95,mean=0,sd=1)
## [1] 1.644854
\[ \begin{align} \alpha&=1-0.99\\ \frac{\alpha}{2}&=\frac{.01}{2}\\ z_\frac{\alpha}{2}&=z_{.005}\\ P(Z>z_.05)&=.005\\ 1-P(Z<z_.05)&=.005\\ P(Z<z_.05)&=.995 \end{align} \]
qnorm(0.995,mean=0,sd=1)
## [1] 2.575829
\(100(1-\alpha)\%\) confidence interval for the unknown population mean \[ \hat{p}\pm z_\frac{\alpha}{2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
CAUTION: This only applies if \(n\hat{p}\geq 15\) and \(n(1-\hat{p})\geq 15\).
\[ n=\left(\frac{z_\frac{\alpha}{2}\times\sigma}{\text{SE}}\right)^2 \]
The \(\text{SE}\) formula comes from
the confidence interval for unknown population mean \(\hat{p}\pm z_\frac{\alpha}{2}
\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
Objective: distance between the sample proportion and the population
proportion should be no more than \(\text{SE}\) with \(\text{desired}\%\) probability
\(z_\frac{\alpha}{2}\) accounts for the
\(\text{desired}\%\)
\[
\begin{align}
\text{SE}&=z_\frac{\alpha}{2}\sqrt{\frac{p(1-p)}{n}}\\
(\text{SE})^2&=\left(z_\frac{\alpha}{2}\sqrt{\frac{p(1-p)}{n}}\right)^2\\
&=\frac{\left(z_\frac{\alpha}{2}\right)^2p(1-p)}{n}\\
n&=\frac{\left(z_\frac{\alpha}{2}\right)^2p(1-p)}{(\text{SE})^2}
\end{align}
\]
p <- seq(from=0,to=1,by=.001)
plot(p,p*(1-p))
\(n=2,200 \text{ eligible voters}\) have been asked about their voting preferences in an election with two candidates, \(A\) and \(B\). 471 of the 2,200 will vote for \(A\). Let \(p\) be the proportion of people in the population will vote for candidate \(A\). What is the 95% confidence interval for \(p\)?
\[ \begin{align} A&=?\\ \hat{A}&=471\\ n&=2200\\ \hat{p}&=\frac{\hat{A}}{n}\\ &=\frac{471}{2200}\\ &=\color{green}{0.2140909}\\ \end{align} \]
\(\hat{A}\) follows binomial distribution so
\[ \begin{align} \text{SD}(\hat{A})&=\sqrt{np(1-p)}\\ &\approx \sqrt{n\hat{p}(1-\hat{p})}\\ &\approx\sqrt{2200\times 0.2140909 \times 0.7859091}\\ &\approx 19.23963\\ \text{SD}(\hat{p})&=\frac{\text{SD}(\hat{A})}{n}\\ &=\frac{19.23963}{2200}\\ &=\color{red}{0.008745286} \end{align} \]
“The expected value of \(\hat{p}\)
is \(p\)”
Remember that \(p\) and \(\hat{p}\) are proportions, so the
proportions should be the same
\(\hat{p}\) approximately follows
normal distribution (because \(n\) is
sufficiently large), so with \(z_{\frac{\alpha}{2}}=1.96\),
\[ -z_\frac{\alpha}{2}<\frac{\bar{X}-\mu}{\sigma_\bar{X}}<z_\frac{\alpha}{2}\\ -1.96<\frac{\hat{p}-p}{\color{red}{.0087}}<1.96\\ -1.96<\frac{\color{green}{0.2140909}-p}{\color{red}{.0087}}<1.96\\ -1.96\times \color{red}{.0087}<\color{green}{0.2140909}-p<1.96\times \color{red}{.0087}\\ (-1.96\times \color{red}{.0087})-\color{green}{0.2140909}<-p<(1.96\times \color{red}{.0087})-\color{green}{0.2140909}\\ -0.2311429<-p<-0.1970389\\ 0.1970389<p<0.2311429\\ (0.197,0.231) \]
So, we have 95% confidence that the population fraction of people voting for \(A\) is between 19.7% and 23.1%.
Suppose that \(n=10000\) and \(\hat{A}=4700\). What is the 95% confidence interval that \(A\) will win?
\[ \begin{align} A&=?\\ \hat{A}&=4700\\ n&=\color{blue}{10000}\\ \hat{p}&=\frac{\hat{A}}{n}\\ &=\frac{4700}{10000}\\ &=\color{red}{.47} \end{align} \]
\(n\) is sufficiently large, so binomial distribution is normal
\[ \begin{align} \hat{p}\pm z_\frac{\alpha}{2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}&= \color{red}{.47}\pm 1.96 \sqrt{\frac{\color{red}{.47}(1-\color{red}{.47})}{\color{blue}{10000}}}\\ &=.47\pm.009782344\\ &\approx .47\pm.01\\ &=(.46,.48) \end{align} \]
So, we have 95% confidence that the proportion of votes \(A\) will receive is between 46% and 48%.
Suppose that we would like to estimate the average wage in an industry. How large sample is needed to be 90% sure that the distance between the sample mean and the population mean is no more than .5? Suppose that \(\sigma=\$4.00\)
\[ \begin{align} z_\frac{\alpha}{2}\frac{4}{\sqrt{n}}&=.5\\ \\ \alpha&=1-.9\\ &=.1\\ z_\frac{\alpha}{2}&=z_{.05}\\ \text{qnorm}(1-.05,0,1)&=1.644854\\ z_{.05}&=1.644854 \end{align} \]
A food-products company conducted a market study by randomly sampling and interviewing 1,000 consumers to determine which brand of breakfast cereal they prefer. In this sample 313 consumers were found to prefer the company’s brand. Estimate the true proportion of consumers who prefer the company’s brand using a 95% confidence interval.
\[ \begin{align} \hat{p}\pm z_{\alpha /2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}&= .313\pm 1.96\sqrt{\frac{.313(1-.313)}{1000}}\\ &=(0.284,0.342) \end{align} \]
CIF_unk_pop_mean(313,1000,1.96,3)
## [1] "(0.284,0.342)"
In a marketing study the objective is to estimate the average household income in a population. The researchers want to be 95% confident that the difference between the real population mean and the sample mean is no more than $500. A small pilot study resulted with sample standard deviation $2,500. How large sample is necessary to achieve the above objective?
\[ \begin{align} n&=\left(\frac{z_\frac{\alpha}{2}\times\sigma}{\text{SE}}\right)^2\\ &=\left(\frac{1.96\times 2500}{\text{500}}\right)^2\\ &=96.04\\ &\approx 97 \end{align} \]
samp_size_pop_mean(1.96,2500,500)
## [1] 97
A manufacturer of boxes of candy is concerned about the proportion of imperfect boxes–those containing cracked, broken, or otherwise damaged candies.
samp_size_pop_prop(2.58,.5,.015)
## [1] 7397
\[ \begin{align} n&=\frac{\left(z_\frac{\alpha}{2}\right)^2p(1-p)}{(\text{SE})^2}\\ &=\frac{(2.57)^2\times .5 \times .5}{(.015)^2}\\ &\approx 7397 \end{align} \]
Substitute .1 for \(p\), since it maximizes \(p(1-p)\) in \((.005,.1)\)
\[ \begin{align} n&=\frac{\left(z_\frac{\alpha}{2}\right)^2p(1-p)}{(\text{SE})^2}\\ &=\frac{(2.57)^2\times .1 \times .9}{(.015)^2}\\ &\approx 2642 \end{align} \]
samp_size_pop_prop(2.57,.1,.015)
## [1] 2642
The information that the true proportion is below \(.1\) reduced the sample size. (The lower bound \(.005\) was useless)
The USPS reports that 95% of first-class mail within the same city is delivered on time (i.e. within 2 days of the time of mailing). To gauge the USPS performance, Price Waterhouse monitored the delivery of first-class mail items between Dec. 10 and Mar. 3–the most difficult delivery season due to bad weather conditions and holidays. In a sample of 332,000 items, Price Waterhouse determined that 282,200 were delivered on time. Comment on the performance of USPS first-class mail service over this time period.
\[ \begin{align} n&=332000\\ Y&=282200\\ \\ \hat{p}\pm z_\frac{\alpha}{2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}&=.85\pm 1.96 \sqrt{\frac{.85(1-.85)}{332000}} \end{align} \]
CIF_unk_pop_mean(282200,332000,1.96,3)
## [1] "(0.849,0.851)"
It costs more to produce defective items–because they must be scrapped or reworked–than it does to produce non-defective items. This simple fact suggests that manufacturers should ensure the quality of their products by perfecting their production processes rather than through inspection of finished products (Out of the Crisis, Deming, 1986). In order to better understand a particular metal-stamping process, a manufacturer wishes to estimate the mean length of items produced by the process during the past 24 hours.
\[ \begin{align} \sigma&=2\\ \text{SE}&=.1\\ z_{\alpha /2}=z_{.05}&=1.65\\ n=\left(\frac{z_{\alpha /2}\times \sigma}{SE}\right)^2&=\left(\frac{1.644854\times 2}{.1}\right)^2\\ &\approx 1083 \end{align} \]
samp_size_pop_mean(1.644854,2,.1)
## [1] 1083
Time permits the use of a sample size no larger than 100. If a
90% confidence interval for \(\mu\) is
constructed using \(n=100\), will it be
wider or narrower than would have been obtained using the sample size
determined in part a? Explain.
Wider, since \(s\) will be larger.
Consider: \(\frac{\sigma}{\sqrt{n}}\)
\[
\frac{2}{\sqrt{100}} \text{ vs. }\frac{2}{\sqrt{1083}}\\
0.2\text{ vs. }0.06077371
\]
If management requires that \(\mu\) be estimated to within .1 mm and that a sample size of no more than 100 be used, what is (approximately) the maxiumum confidence level that could be attained for a confidence interval that meets management’s specifications?
\[ \begin{align} \sigma&=2\\ \text{SE}&=.1\\ n&=100\\ \text{SE}&=z_{\alpha /2}\frac{\sigma}{\sqrt{n}}\\ .1&=z_{\alpha /2}\frac{2}{\sqrt{100}}\\ z_{\alpha /2}&=.5\\ \end{align} \]
Then, find the area to the left of the z-score
pnorm(.5)
## [1] 0.6914625
The cumulative probability of 0.6914625 represents the proportion of
the distribution that falls below \(z_{\alpha
/2} = .5\), i.e. \(P(Z<0.5)=0.6914625\)
So, \[
\begin{align}
P(Z>0.5)&=1-P(Z<0.5)\\
&=1-\text{pnorm}(.5,0,1)\\
&=1-0.6914625\\
&=0.3085375\\
z_{\alpha /2}&=0.3085375\\
\frac{\alpha}{2}&=0.3085375\\
\alpha&=0.3085375\times 2\\
&=0.617075\\
\text{desired confidence %}&=1-\alpha\\
&=1-0.617075\\
&\approx 0.383
\end{align}
\]
Going backwards: \[ \begin{align} \alpha&=1-0.383\\ \frac{\alpha}{2}&=\frac{0.617}{2}\\ z_\frac{\alpha}{2}&=z_{.3085}\\ P(Z>z_.3085)&=.3085\\ 1-P(Z<z_.3085)&=.3085\\ P(Z<z_.3085)&=1-.3085\\ &=0.6915 \end{align} \]
qnorm(.6915,0,1)
## [1] 0.5001066
\[ \therefore z_.3085=z_\frac{\alpha}{2}=0.5001066 \]