What is interval estimation?
2025-06-09
What is interval estimation?
Interval estimation is a method of using sample data to calculate a range of values that is likely to contain a population parameter. This interval is centered around a sample statistic and is paired with a confidence level, which reflects how often the intervals would contain the true parameter if the process were repeated many times.
In this presentation, I will be using a population consisting of the mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971. This is a population where we already know all the parameter values we are trying to derive from the sample set, so we can see the interval estimation in the context of the known population parameters.
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 47.90 50.58 51.20 51.16 51.90 54.60
The code below takes a random sample from the population data without replacement. We’re going to use the parameters of this data set to make an inference on the parameters of the population.
sampleSet <- sample(population, 20, replace = FALSE)
On the next slide, a graph displaying the sample’s density vs population’s density is displayed. Notice how the shape of the distributions are similar and note the relationship mean has with the shape of the density function.
There are two ways to calculate confidence intervals depending on our knowledge of the population parameter: standard deviation. If the population standard deviation is known, we can use the formula:\[
\bar{x} \pm z^* \left( \frac{\sigma}{\sqrt{n}} \right)
\] - \(\bar{x}\): Sample mean
- \(z^*\): Critical z-value corresponding to confidence level
- \(\sigma\): Population standard deviation
- \(n\): Sample size
- \(\pm\): Plus-minus symbol indicating interval range
In practice, usually the population parameters are unknown, hence we are utilizing confidence intervals. Since the case of not knowing the population parameters are more common we use this formula instead:\[
\bar{x} \pm t^* \left( \frac{s}{\sqrt{n}} \right)
\] - \(\bar{x}\): Sample mean
- \(t^*\): Critical t-value corresponding to confidence level
- s: Population standard deviation
- \(n\): Sample size
- \(\pm\): Plus-minus symbol indicating interval range
## Calculate the sample mean sample_mean <- mean(sampleSet) ## Calculate the critical t-val with 95% confidence critical_t <- qt(0.975, 19) ## Calculate the sample standard deviation sample_stDev <- sd(sampleSet) ## Calculate the sqrt of the sample size sampleSize_sqrt <- sqrt(length(sampleSet))
With these values, the lower and upper bounds of the confidence intervals can be calculated.
Now we have all the values we need in order to make inferences on our populations mean parameter value.
Lower-bound
## [1] 50.83961
Upper-bound
## [1] 51.90039
We have now calculated with 95% confidence, that the value of the population’s mean after repetitive testing on the population set will be included within the confidence interval derived from the sample set. The sample’s density function closely resembles the population’s, so it makes sense that a confidence interval based on the sample is a good approximation of the range where the true population parameter is going to be as our number of samples increase.