1 Introduction

We will be doing a brief analysis on the dataset protein provided by the link https://raw.githubusercontent.com/pengdsci/sta321/main/ww02/w02-Protein_Supply_Quantity_Data.csv. This dataset includes the percentage of protein intake from different categories of food of countries all over the world, as well as obesity and COVID-19 data for these countries. In this week’s assignment, we will analyze one specific variable in this dataset by using two differing methods to construct a confidence interval for the mean of the sample for the chosen variable, and then we will compare the two confidence intervals. For my assignment, I chose the variable eggs.

2 Confidence Interval #1:

This confidence interval is constructed by taking the z-score of \(\frac{\alpha}{2}=0.025\) to construct a confidence interval of \([\bar{x}-Z_{\frac{\alpha}{2}}\cdot\frac{\sigma}{\sqrt{n}}, \bar{x}+Z_{\frac{\alpha}{2}}\cdot\frac{\sigma}{\sqrt{n}}]\) where \(\bar{x}\) is the sample mean, \(\sigma\) is the sample standard deviation, and \(n\) is the number of observation in the variable eggs. We will also make a histogram for the values in eggs.

LCL.95 UCL.95
1.043119 1.279816

3 Confidence Interval #2: Bootstrapping

The second confidence interval will be constructed through the bootstrapping sampling method. This will involve drawing from the same sample repeatedly with replacement to generate multiple samples, and then using the means of the samples generated to construct our confidence interval. We will also construct a histogram of the means of the samples generated in the bootstrapping process.

##     2.5%    97.5% 
## 1.050542 1.278106

4 Discussion

An examination of the two confidence intervals finds them very similar to each other. The 95% confidence interval indicates that we are 95% confident that these intervals contain the true mean of the protein intake ascribed to eggs of different countries over the world. Because of the nature of the random sampling done in the bootstrapping process, there is some variability in the final bootstrap confidence interval.

When looking at the histograms, we find that the first histogram created of just the variable eggs from the protein dataset shows a distribution that is unimodal with right skew that is not symmetric. The second histogram of the means of the samples generated through the bootstrapping method shows a unimodal, relatively symmetric distribution.