Confidence Intervals and T-Distribution

2024-03-21

Confidence Intervals

What are Confidence Intervals and the role they play in statistics?

Confidence intervals are a range of values typically described by a lower and upper bound, referred to as a margin of error, such as ‘+/-3%’. They allow us to estimate unknown population parameters, like the population mean, based on population sample data.
An interval has a % level of confidence: 95% confidence level would suggest that repeating the sampling process would result in 95% of the intervals containing the true parameter value.
Confidence intervals are best used for smaller samples where there is less information provided and therefore more uncertainty with standard methods.

How do you calculate Confidence Intervals?

\[ CI = \bar{x} \pm z \left( \frac{s}{\sqrt{n}} \right) \] Where: \(CI\) = Confidence Interval \(\bar{x}\) = Sample Mean \(z\) = Confidence Level \(s\) = Sample Standard Deviation \(n\) = Sample Size

Confidence Interval ggplot with “PlantGrowth” dataset

Previous Example R Code

library(ggplot2)
library(dplyr)
data(PlantGrowth)

summary_data <- PlantGrowth %>%
  group_by(group) %>%
  summarise(mean = mean(weight),
            se = sd(weight) / sqrt(n()))

ggplot(summary_data, aes(x = group, y = mean, group = 1)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = mean - 1.96 * se, ymax = mean + 1.96 * se), width = 0.1) +
  labs(x = "Treatment Type", y = "Weight") +
  theme_minimal() + 
  theme( 
    axis.title = element_text(size = 16, face = "bold", margin =  margin(t = 20)), 
    axis.text = element_text(size = 14, face = "bold"), 
    plot.title = element_text(size = 20, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 16), 
    legend.text = element_text(size = 14) 
  )

Plotly graph with “PlantGrowth” dataset

T-Distribution

What is T-Distribution and it’s role in statistics and confidence intervals?

T-Distribution is a probability distribution, like normal distribution, and is similar in (bell-curve) shape while having heavier tails to account for the increased statistical variability. T-Distribution is characterized by degrees of freedom. These degrees of freedom depend on the dataset sample sizes.
T-Distribution is particularly useful for smaller sample sizes (such as those n<30) and when population standard deviation (or, sigma) is unknown.
T-Distribution can especially be used to provide critical values and confidence intervals for smaller sample data or illustrating rejection regions in hypothesis testing.

How do you calculate T-Distribution?

\[ t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} \] Where: \(t\) = t-value \(\bar{x}\) = Sample Mean \(\mu\) = Population Mean \(s\) = Sample Standard Deviation \(n\) = Sample Size