Description of the
Variable
In this assignment, I study the variable
AnimalProducts, which represents the percentage
of protein intake obtained from animal-based food sources for
each country in the dataset.
The goal is to estimate the population mean of this
variable and construct confidence intervals using:
- A classical t-based confidence interval learned in prior statistics
courses, and
- A bootstrap confidence interval constructed using resampling methods
introduced this week.
Data Import and
Exploration
data_path <- "C:\\Users\\rg03\\Downloads\\w02-Protein_Supply_Quantity_Data.csv"
data <- read.csv(data_path)
x <- data$AnimalProducts
x <- na.omit(x)
length(x)
## [1] 170
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.456 14.461 21.853 21.232 28.299 35.786
Interpretation (numeric).
After removing missing values, the sample contains n = 170
countries. The sample mean of AnimalProducts is
21.232%, and the median is 21.853%,
which are close and suggest the distribution is roughly symmetric. The
variability across countries is large: values range from
4.456% to 35.786%. The middle 50% of
countries fall between 14.461% (1st quartile) and
28.299% (3rd quartile). This substantial spread
motivates constructing confidence intervals to quantify uncertainty in
the population mean.
Classical Confidence
Interval for the Mean
Rationale
To estimate the population mean \(\mu\) of AnimalProducts, we
construct a 95% confidence interval using a t-based
interval:
\[
\bar{x} \pm t_{\alpha/2,\,n-1}\frac{s}{\sqrt{n}}
\]
n <- length(x)
xbar <- mean(x)
s <- sd(x)
alpha <- 0.05
t_crit <- qt(1 - alpha/2, df = n - 1)
ci_lower <- xbar - t_crit * s / sqrt(n)
ci_upper <- xbar + t_crit * s / sqrt(n)
c(n = n, xbar = xbar, s = s)
## n xbar s
## 170.000000 21.232152 7.921757
c(Lower = ci_lower, Upper = ci_upper)
## Lower Upper
## 20.03275 22.43156
Interpretation
The sample mean is \(\bar{x} =
21.232\%\). The 95% classical t-based confidence
interval for the population mean is:
\[
(20.033\%,\; 22.432\%).
\]
This means the true population mean percentage of protein intake from
animal products is plausibly between about 20.0% and
22.4%. The interval width is about 2.40 percentage
points, which is fairly narrow due to the large sample size
(n = 170). Under repeated sampling, about 95% of
intervals constructed using this method would contain the true mean.
Bootstrap Confidence
Interval for the Mean
Rationale
The bootstrap method estimates the sampling distribution of the
sample mean by repeatedly resampling the observed data with
replacement. Each bootstrap resample has the same
sample size n, and a sample mean is computed for each resample.
This provides an empirical approximation to the sampling distribution of
\(\bar{x}\) without relying strongly on
normality assumptions.
B <- 10000
boot_means <- numeric(B)
for (b in 1:B) {
boot_sample <- sample(x, size = n, replace = TRUE)
boot_means[b] <- mean(boot_sample)
}
Interpretation
We generated B = 10,000 bootstrap resamples, each
consisting of n = 170 observations drawn with
replacement from the original sample. For each resample we computed the
sample mean. The resulting set of 10,000 bootstrap means forms an
empirical sampling distribution for the mean.
Bootstrap Sampling
Distribution of the Sample Mean
hist(
boot_means,
breaks = 40,
main = "Bootstrap Sampling Distribution of the Sample Mean",
xlab = "Bootstrap Sample Means",
col = "lightblue",
border = "white"
)
abline(v = xbar, col = "red", lwd = 2)

Interpretation
The bootstrap sampling distribution is centered near the observed sample
mean (21.232%), shown by the red vertical line. The
distribution is approximately symmetric and bell-shaped, suggesting the
sampling distribution of the mean is close to normal. This supports the
idea that the classical t-based confidence interval is reasonable here
while still allowing the bootstrap method to confirm inference without
heavy parametric assumptions.
Bootstrap Percentile
Confidence Interval
We construct a 95% bootstrap percentile confidence interval using the
2.5th and 97.5th percentiles of the bootstrap means.
boot_ci <- quantile(boot_means, probs = c(0.025, 0.975))
boot_ci
## 2.5% 97.5%
## 20.05465 22.39994
Interpretation
The 95% bootstrap percentile confidence interval
is:
\[
(20.055\%,\; 22.400\%).
\]
This interval contains the middle 95% of the bootstrap sampling
distribution of the mean. Its width is approximately 2.35
percentage points, which is extremely close to the classical
interval width.
Comparison of the Two
Confidence Intervals
ci_table <- rbind(
Classical_CI = c(ci_lower, ci_upper),
Bootstrap_CI = as.numeric(boot_ci)
)
colnames(ci_table) <- c("Lower", "Upper")
ci_table
## Lower Upper
## Classical_CI 20.03275 22.43156
## Bootstrap_CI 20.05465 22.39994
Interpretation
The two confidence intervals are nearly identical:
- Classical 95% CI: \((20.033\%, 22.432\%)\)
- Bootstrap 95% CI: \((20.055\%, 22.400\%)\)
Both intervals are centered around the same mean (about
21.23%) and have almost the same width. This close
agreement suggests that the t-based method performs well for this
dataset and that the bootstrap procedure confirms the stability of the
mean estimate and its uncertainty.
Conclusion
Using AnimalProducts from n = 170
countries, the estimated mean percentage of protein intake from
animal-based food sources is 21.232%.
- The classical 95% t-based confidence interval is
\((20.033\%, 22.432\%)\).
- The bootstrap 95% percentile confidence interval is
\((20.055\%, 22.400\%)\).
Because these two intervals are extremely similar, they lead to the
same substantive conclusion: the population mean percentage of protein
intake from animal products across countries is likely between about
20% and 22.5%, and the estimate is stable under both
classical and bootstrap inference methods.
