HW3.knit

2025-06-09

Introduction to Interval Estimation

What is interval estimation?

Interval estimation is a method of using sample data to calculate a range of values that is likely to contain a population parameter. This interval is centered around a sample statistic and is paired with a confidence level, which reflects how often the intervals would contain the true parameter if the process were repeated many times.

Introducing the Population

In this presentation, I will be using a population consisting of the mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971. This is a population where we already know all the parameter values we are trying to derive from the sample set, so we can see the interval estimation in the context of the known population parameters.

Characteristics of the Population

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47.90   50.58   51.20   51.16   51.90   54.60

Sampling the Population

The code below takes a random sample from the population data without replacement. We’re going to use the parameters of this data set to make an inference on the parameters of the population.

sampleSet <- sample(population, 20, replace = FALSE)

On the next slide, a graph displaying the sample’s density vs population’s density is displayed. Notice how the shape of the distributions are similar and note the relationship mean has with the shape of the density function.

Plotting Population Density vs. Sample Density

Choosing a Method and Defining the Needed Values

There are two ways to calculate confidence intervals depending on our knowledge of the population parameter: standard deviation. If the population standard deviation is known, we can use the formula:\[ \bar{x} \pm z^* \left( \frac{\sigma}{\sqrt{n}} \right) \] - \(\bar{x}\): Sample mean
- \(z^*\): Critical z-value corresponding to confidence level
- \(\sigma\): Population standard deviation
- \(n\): Sample size
- \(\pm\): Plus-minus symbol indicating interval range

Choosing a Method and Defining the Needed Values

In practice, usually the population parameters are unknown, hence we are utilizing confidence intervals. Since the case of not knowing the population parameters are more common we use this formula instead:\[ \bar{x} \pm t^* \left( \frac{s}{\sqrt{n}} \right) \] - \(\bar{x}\): Sample mean
- \(t^*\): Critical t-value corresponding to confidence level
- s: Population standard deviation
- \(n\): Sample size
- \(\pm\): Plus-minus symbol indicating interval range

Finding Formula Values

## Calculate the sample mean 
sample_mean <- mean(sampleSet)

## Calculate the critical t-val with 95% confidence
critical_t <- qt(0.975, 19)

## Calculate the sample standard deviation
sample_stDev <- sd(sampleSet)

## Calculate the sqrt of the sample size
sampleSize_sqrt <- sqrt(length(sampleSet))

With these values, the lower and upper bounds of the confidence intervals can be calculated.

Calculating Bounds

Now we have all the values we need in order to make inferences on our populations mean parameter value.

Lower-bound

## [1] 50.83961

Upper-bound

## [1] 51.90039

Interpreting Results

We have now calculated with 95% confidence, that the value of the population’s mean after repetitive testing on the population set will be included within the confidence interval derived from the sample set. The sample’s density function closely resembles the population’s, so it makes sense that a confidence interval based on the sample is a good approximation of the range where the true population parameter is going to be as our number of samples increase.