A point estimate is a single value used to estimate a population parameter.
For example, the sample proportion \(\hat p\) is the best point estimate (also called unbiased estimator) of the population proportion \(p\).
A point estimate is a single value used to estimate a population parameter.
For example, the sample proportion \(\hat p\) is the best point estimate (also called unbiased estimator) of the population proportion \(p\).
A point estimate provides a single plausible value for a parameter. However, a point estimate is rarely perfect; usually there is some error in the estimate. In addition to supplying a point estimate of a parameter, a next logical step would be to provide a plausible range of values for the parameter.
A confidence interval or (interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI.
The confidence level is the probability \(1-\alpha\) (such as \(0.95\), or \(95\%\)) that the confidence interval actually does contain the population parameter, assuming that the estimate process is repeated a large number of times. (The confidence level is also called the degree of confidence, or the confidence coefficient.)
Constructing a \(95\%\) confidence interval
When the sampling distribution of a point estimate can reasonably be modeled as normal, the point estimate we observe will be within \(1.96\) standard errors of the true value of interest about \(95\%\) of the time. Thus, a \(95\%\) confidence interval for such a point estimate can be constructed:
\(\text {point estimate} \pm 1.96 \times SE\)
Fifty samples of size \(n = 300\) were simulated when \(p = 0.30\). For each sample, a confidence interval was created to capture the true proportion \(p\). How many did not capture \(p = 0.30?\)
If the point estimate follows the normal model with standard error \(SE\), then a confidence interval for the population parameter is
\[ \text {point estimate} \pm z{^{\star}_{\alpha/2}} \times SE \]
where \(z^\star\) , the critical value, depends on the confidence level (\(CL\)) selected, and \(\alpha = 1-CL\)
The heart patients who receive stents are \(9\%\) more likely to suffer stroke from usage of the stent than those who do not have it. The estimate's standard error \((SE)\) is \(0.028\). Construct a \(95\%\) confidence interval for the change in strole rates from the usage of stent.
\[ \begin{align} \text {95% Confidence Interval} &= \text {point estimate} \pm 1.96 \times SE \\ &= 0.090 \pm 1.96 \times 0.028 \\ &=(0.035, 0.145) \end{align} \]
\[ \begin{align} \text {90% Confidence Interval} &= \text {point estimate} \pm 1.645 \times SE \\ &= 0.090 \pm 1.645 \times 0.028 \\ &=(0.044, 0.136) \end{align} \]
The margin of error (ME) is the distance between the point estimate and the lower or upper bound of a confidence interval.
\[ \begin{align} \text{confidence interval} &= \text {point estimate} \pm z{^{\star}_{\alpha/2}} \times SE \\ &= \text {point estimate} \pm \text{margin of error} \end{align} \]
A pilot study showed that \(0.5\%\) of credit card offers in the mail end up with the person signing up. To be within \(0.1\%\) of the true rate with \(95\%\) confidence, how big does the test mailing have to be?
\[ \begin{align} ME &= z{^{\star}_{\alpha/2}} \times SE \\ ME &= z{^{\star}_{\alpha/2}} \times \sqrt \frac{\hat p \hat q}{n} \\ 0.001 &= 1.96 \times \sqrt \frac{(0.005)(0.995)}{n} \\ (0.001)^2 &= (1.96)^2 \times \frac{(0.005)(0.995)}{n} \\ n &= (1.96)^2 \times \frac{(0.005)(0.995)}{(0.001)^2} \\ n &= 19112 \end{align} \]
It is extremely rare that we want to estimate an unknown value of a population mean \(\mu\) but we somehow know the value of the population standard deviation \(\sigma\). If we somehow do know the value of \(\sigma\), the confidence interval is constructed using the standard normal distribution instead of the Student \(t\) distribution.
\[ ME = z_{\alpha/2}.\frac{\sigma}{\sqrt{n}} \text { ; use with known } \sigma \] \[ n = 15, \bar x = 30.9, \sigma = 2.9, CL = 95\% \\ ME = 1.96.\frac{2.9}{\sqrt{15}} = 1.46760 \\ \bar x - ME < \mu < \bar x + ME \\ 30.9 - 1.46760 < \mu < 30.9 + 1.46760 \\ 29.4 < \mu < 32.4 \]
\(\text {The sample mean } \bar x \text { is the best estimate of the population mean } \mu.\)
In this case, standard error to calculate CI about \(\bar x\) is estimated from the sample mean. The distribution used here is called \(\text{Student } t \text{ Distribution}.\)
According to the Central Limit Theorem, the sampling distribution of a statistic (e.g. sample mean) will follow a normal distribution, as long as the sample size is sufficiently large (\(n>30\)).
But sample sizes are sometimes small, and often we do not know the standard deviation of the population. When either of these problems occur, statisticians rely on the distribution of the \(\text {t-statistic (t-score)}\) with the degrees of freedom \((n-1)\),
\(t = \frac {\bar x - \mu}{s/\sqrt n}\)
\(\text{Degrees of Freedom} = n - 1\)
\(\text {t-distribution }\) is determined by its degrees of freedom. The degrees of freedom refers to the number of independent observations in a set of data.
Properties of the \(\text {t-Distribution:}\)
Find a \(95\%\) confidence interval for mirex concentrations in salmon.
\[ \begin{align} n &= 150 \\ \bar x &= 0.0913 \space ppm \\ s &= 0.0495 \space ppm \\ df &= 150 - 1 = 149 \\ \\ SE(\bar x) &= \frac{0.0495}{\sqrt{150}} = 0.0040 \\ t^*_{149} &= 1.976 \\ \\ \text {Confidence Interval of } \bar x: \\ &\bar x \pm t^*_{149} \times SE(\bar x) \\ &= 0.0913 \pm 1.976 \times 0.0040 \\ &= 0.0913 \pm 0.0079 \\ &= (0.0834, 0.0992) \end{align} \]
Should you buy a movie download accelerator? To test the download times, find the minimum number of downloads that you need to do during a free trial period to obtain a \(95\%\) CL with a ME < \(8\) minutes. Given, \(\sigma = 10 \space min\)
First, calculate \(n\) using \(z-score\)
\[ \begin{align} ME &< 8 \\ \\ \text {with } z^* &= 2 \text { at 95% CL} \\ \\ 2 \times \frac {10}{\sqrt n} &< 8 \\ n &> 6.25 \\ n &\approx 7 \\ \\ \end{align} \]
Sample size turns out to be too small for normal approximation. Therefore, re-esimate \(n\) using \(t-score\).
\[ \begin{align} df &= 7 - 1 = 6 \\ \\ \text {with } t^*_6 &= 2.447 \text { at 95% CL} \\ \\ 2.447 \times \frac {10}{\sqrt n} &< 8 \\ n &> 9.36 \\ n &\approx 10 \\ \\ \end{align} \]
Hence, at least 10 trial downloads need to be run to make sure ME remains less than \(8\) min.
\[ \begin{array}{l|c} conditions & \text{method} \\ \hline \sigma \text{ not known; normally distributed population or n > 30} & t \\ \hline \sigma \text{ known; normally distributed population or n >30} & z \end{array} \]
This section presents methods for using a sample standard deviation \(s\) (or a sample variance \(s^2\)) to estimate the value of the corresponding population standard deviation \(\sigma\) (or population variance \(\sigma^2\)).
Point Estimate: The sample variance \(s^2\) is the best point estimator of the population variance \(\sigma^2\). The sample standard deviation \(s\) is commonly used as a point estimate of \(\sigma\), even though it is a biased estimator.
Confidence Interval: When constructing a confidence interval estimate of a population standard deviation (or population variance), we construct the confidence interval using the \(\chi^2 \text{ distribution}\).
\[ \chi^2 = \frac{(n-1)s^2}{\sigma^2} \]
Confidence Interval for the Population Varaince \(\sigma^2\)
\[ \frac{(n-1)s^2}{\chi^2_L} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_R} \]
Confidence Interval for the Population Variance \(\sigma\)
\[ \sqrt{\frac{(n-1)s^2}{\chi^2_L}} < \sigma < \sqrt{\frac{(n-1)s^2}{\chi^2_R}} \]
\[ \begin{align} &s = 14.29263 \\ &n = 22 \\ &CL = 95\% \\ \\ &\frac{(n-1)s^2}{\chi^2_L} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_R} \\ \\ &\frac{21.(14.29263)^2}{10.283^2} < \sigma^2 < \frac{21.(14.29263)^2}{35.479^2} \\ \\ &120.9 < \sigma^2 < 417.2 \\ &11.0 < \sigma < 20.4 \end{align} \]