Introduction

The goal of this project to explore the average worldwide consumption of vegetable oils and assess uncertainty around the mean. I have much personal interest in this because I am a firm believer in using non seed oils when cooking, like butter and olive oil.

Materials

The dataset used is the Protein Supply Quantity data from the class page. The variable used is VegetableOils.

Methodology

The methodology included is data preparation, classic CI, bootstrap CI, visualization with histograms, and comparison.

Results

The classic 95 CI for the mean of Vegetable Oils was approximately .017 and .024, while the bootstrap 95% confidence interval was very similar results. This shows that the assumptions of normality in the classical version is satisfied.

General Discussion

The analysis shows that the mean vegetable oils consumption is small but accurately estimated. I would like to further compare to methods I prefer, like butter or olive oil.

library(knitr)

url <- "https://pengdsci.github.io/STA321/ww02/w02-Protein_Supply_Quantity_Data.csv"
protein <- read.csv(url, header = TRUE, check.names = FALSE)


x <- protein[["VegetableOils"]]


if (!is.numeric(x)) {
  x <- as.numeric(gsub(",", "", as.character(x)))
}


x <- x[!is.na(x)]


n    <- length(x)
xbar <- mean(x)
s    <- sd(x)
se   <- s / sqrt(n)
tcrit <- qt(0.975, df = n - 1)

classic_ci <- c(
  lower = xbar - tcrit * se,
  upper = xbar + tcrit * se
)
classic_ci
##      lower      upper 
## 0.01723488 0.02437453
set.seed(321)
B <- 1000
bt.sample.mean.vec <- numeric(B)

for (i in 1:B) {
  ith.bt.sample <- sample(x, size = n, replace = TRUE)
  bt.sample.mean.vec[i] <- mean(ith.bt.sample)
}

hist(
  bt.sample.mean.vec,
  breaks = 30,
  xlab = "Bootstrap sample means",
  main = "Bootstrap Sampling Distribution of VegetableOils Means"
)

boot_ci <- quantile(bt.sample.mean.vec, c(0.025, 0.975))
boot_ci
##       2.5%      97.5% 
## 0.01749513 0.02443100