1 Description of the Variable

In this assignment, I study the variable AnimalProducts, which represents the percentage of protein intake obtained from animal-based food sources for each country in the dataset.

The goal is to estimate the population mean of this variable and construct confidence intervals using:

  1. A classical t-based confidence interval learned in prior statistics courses, and
  2. A bootstrap confidence interval constructed using resampling methods introduced this week.

2 Data Import and Exploration

data_path <- "C:\\Users\\rg03\\Downloads\\w02-Protein_Supply_Quantity_Data.csv"
data <- read.csv(data_path)

x <- data$AnimalProducts
x <- na.omit(x)

length(x)
## [1] 170
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.456  14.461  21.853  21.232  28.299  35.786

Interpretation (numeric).
After removing missing values, the sample contains n = 170 countries. The sample mean of AnimalProducts is 21.232%, and the median is 21.853%, which are close and suggest the distribution is roughly symmetric. The variability across countries is large: values range from 4.456% to 35.786%. The middle 50% of countries fall between 14.461% (1st quartile) and 28.299% (3rd quartile). This substantial spread motivates constructing confidence intervals to quantify uncertainty in the population mean.

3 Classical Confidence Interval for the Mean

3.1 Rationale

To estimate the population mean \(\mu\) of AnimalProducts, we construct a 95% confidence interval using a t-based interval:

\[ \bar{x} \pm t_{\alpha/2,\,n-1}\frac{s}{\sqrt{n}} \]

n <- length(x)
xbar <- mean(x)
s <- sd(x)

alpha <- 0.05
t_crit <- qt(1 - alpha/2, df = n - 1)

ci_lower <- xbar - t_crit * s / sqrt(n)
ci_upper <- xbar + t_crit * s / sqrt(n)

c(n = n, xbar = xbar, s = s)
##          n       xbar          s 
## 170.000000  21.232152   7.921757
c(Lower = ci_lower, Upper = ci_upper)
##    Lower    Upper 
## 20.03275 22.43156

Interpretation
The sample mean is \(\bar{x} = 21.232\%\). The 95% classical t-based confidence interval for the population mean is:

\[ (20.033\%,\; 22.432\%). \]

This means the true population mean percentage of protein intake from animal products is plausibly between about 20.0% and 22.4%. The interval width is about 2.40 percentage points, which is fairly narrow due to the large sample size (n = 170). Under repeated sampling, about 95% of intervals constructed using this method would contain the true mean.

4 Bootstrap Confidence Interval for the Mean

4.1 Rationale

The bootstrap method estimates the sampling distribution of the sample mean by repeatedly resampling the observed data with replacement. Each bootstrap resample has the same sample size n, and a sample mean is computed for each resample. This provides an empirical approximation to the sampling distribution of \(\bar{x}\) without relying strongly on normality assumptions.

B <- 10000
boot_means <- numeric(B)

for (b in 1:B) {
  boot_sample <- sample(x, size = n, replace = TRUE)
  boot_means[b] <- mean(boot_sample)
}

Interpretation
We generated B = 10,000 bootstrap resamples, each consisting of n = 170 observations drawn with replacement from the original sample. For each resample we computed the sample mean. The resulting set of 10,000 bootstrap means forms an empirical sampling distribution for the mean.

4.2 Bootstrap Sampling Distribution of the Sample Mean

hist(
  boot_means,
  breaks = 40,
  main = "Bootstrap Sampling Distribution of the Sample Mean",
  xlab = "Bootstrap Sample Means",
  col = "lightblue",
  border = "white"
)

abline(v = xbar, col = "red", lwd = 2)

Interpretation
The bootstrap sampling distribution is centered near the observed sample mean (21.232%), shown by the red vertical line. The distribution is approximately symmetric and bell-shaped, suggesting the sampling distribution of the mean is close to normal. This supports the idea that the classical t-based confidence interval is reasonable here while still allowing the bootstrap method to confirm inference without heavy parametric assumptions.

4.3 Bootstrap Percentile Confidence Interval

We construct a 95% bootstrap percentile confidence interval using the 2.5th and 97.5th percentiles of the bootstrap means.

boot_ci <- quantile(boot_means, probs = c(0.025, 0.975))
boot_ci
##     2.5%    97.5% 
## 20.05465 22.39994

Interpretation
The 95% bootstrap percentile confidence interval is:

\[ (20.055\%,\; 22.400\%). \]

This interval contains the middle 95% of the bootstrap sampling distribution of the mean. Its width is approximately 2.35 percentage points, which is extremely close to the classical interval width.

5 Comparison of the Two Confidence Intervals

ci_table <- rbind(
  Classical_CI = c(ci_lower, ci_upper),
  Bootstrap_CI = as.numeric(boot_ci)
)
colnames(ci_table) <- c("Lower", "Upper")
ci_table
##                 Lower    Upper
## Classical_CI 20.03275 22.43156
## Bootstrap_CI 20.05465 22.39994

Interpretation
The two confidence intervals are nearly identical:

  • Classical 95% CI: \((20.033\%, 22.432\%)\)
  • Bootstrap 95% CI: \((20.055\%, 22.400\%)\)

Both intervals are centered around the same mean (about 21.23%) and have almost the same width. This close agreement suggests that the t-based method performs well for this dataset and that the bootstrap procedure confirms the stability of the mean estimate and its uncertainty.

6 Conclusion

Using AnimalProducts from n = 170 countries, the estimated mean percentage of protein intake from animal-based food sources is 21.232%.

  • The classical 95% t-based confidence interval is \((20.033\%, 22.432\%)\).
  • The bootstrap 95% percentile confidence interval is \((20.055\%, 22.400\%)\).

Because these two intervals are extremely similar, they lead to the same substantive conclusion: the population mean percentage of protein intake from animal products across countries is likely between about 20% and 22.5%, and the estimate is stable under both classical and bootstrap inference methods.

