1 Introduction

I will be using confidence intervals of the mean with and without bootstrapping to decide which is a better method for inferences.

2 Loading Data

protein = read.csv("https://raw.githubusercontent.com/pengdsci/sta321/main/ww02/w02-Protein_Supply_Quantity_Data.csv", header = TRUE)

## 
#head(protein)
#dim(protein)

3 Data Analysis

The data set shows the percentage of protein intake from different foods around the world. I will be analyzing the Treenuts variable, which will show the percentage of protein intake from tree nuts.

3.1 Confidence Interval of the Mean

sample = sample(protein$Treenuts,                 #finding mean of the original sample
                170,                              #sample size
                replace = FALSE)                  #no replacement 
CI = quantile(sample, c(0.025, 0.975))            #confidence interval of the mean
CI                                                #print CI
##      2.5%     97.5% 
## 0.0000000 0.9948625

3.2 Bootstrapping

sample.mean.vec = NULL                   # empty vector for storing b BT means
for(i in 1:1000){                        #for loop for each of the 170 samples taking 1000 bootstrap samples
  ith.sample = sample(protein$Treenuts,    #finding bootstrap mean
                       170,               #sample size       
                       replace = TRUE     #WITH replacement because of bootstrap and big sample size
                 )                              
   sample.mean.vec[i] = mean(ith.sample)  #mean of ith sample saved in the empty vector
}
b.CI = quantile(sample.mean.vec, c(0.025, 0.975)) #confidence interval of the mean from bootstrapping
b.CI  #printing bootstrap CI
##      2.5%     97.5% 
## 0.2038416 0.2905259
hist(sample.mean.vec,                    #histogram of bootstrap data           
     breaks = 20,                         #amount of breaks                     
     xlab = "Bootstrap sample means",     #x axis label                   
     main="Bootstrap Sampling Distribution \n of Sample Means")  #title

The confidence interval of the mean is (0.0000000, 0.9948625). After using the bootstrap method, we find the confidence interval to be (0.2027401, 0.2902978). Because the bootstrap sample mean confidence interval is much smaller and therefore better than the actual mean confidence interval, it is a better for predictions and estimation.