The goal of this project to explore the average worldwide consumption of vegetable oils and assess uncertainty around the mean. I have much personal interest in this because I am a firm believer in using non seed oils when cooking, like butter and olive oil.
The dataset used is the Protein Supply Quantity data from the class page. The variable used is VegetableOils.
The methodology included is data preparation, classic CI, bootstrap CI, visualization with histograms, and comparison.
The classic 95 CI for the mean of Vegetable Oils was approximately .017 and .024, while the bootstrap 95% confidence interval was very similar results. This shows that the assumptions of normality in the classical version is satisfied.
The analysis shows that the mean vegetable oils consumption is small but accurately estimated. I would like to further compare to methods I prefer, like butter or olive oil.
library(knitr)
url <- "https://pengdsci.github.io/STA321/ww02/w02-Protein_Supply_Quantity_Data.csv"
protein <- read.csv(url, header = TRUE, check.names = FALSE)
x <- protein[["VegetableOils"]]
if (!is.numeric(x)) {
x <- as.numeric(gsub(",", "", as.character(x)))
}
x <- x[!is.na(x)]
n <- length(x)
xbar <- mean(x)
s <- sd(x)
se <- s / sqrt(n)
tcrit <- qt(0.975, df = n - 1)
classic_ci <- c(
lower = xbar - tcrit * se,
upper = xbar + tcrit * se
)
classic_ci
## lower upper
## 0.01723488 0.02437453
set.seed(321)
B <- 1000
bt.sample.mean.vec <- numeric(B)
for (i in 1:B) {
ith.bt.sample <- sample(x, size = n, replace = TRUE)
bt.sample.mean.vec[i] <- mean(ith.bt.sample)
}
hist(
bt.sample.mean.vec,
breaks = 30,
xlab = "Bootstrap sample means",
main = "Bootstrap Sampling Distribution of VegetableOils Means"
)
boot_ci <- quantile(bt.sample.mean.vec, c(0.025, 0.975))
boot_ci
## 2.5% 97.5%
## 0.01749513 0.02443100