library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
protein = read.csv("https://raw.githubusercontent.com/pengdsci/sta321/main/ww02/w02-Protein_Supply_Quantity_Data.csv",
header = TRUE)
var.name = names(protein)
kable(data.frame(var.name))
| var.name |
|---|
| Country |
| AlcoholicBeverages |
| AnimalProducts |
| Animalfats |
| CerealsExcludingBeer |
| Eggs |
| FishSeafood |
| FruitsExcludingWine |
| Meat |
| MilkExcludingButter |
| Offals |
| Oilcrops |
| Pulses |
| Spices |
| StarchyRoots |
| Stimulants |
| Treenuts |
| VegetalProducts |
| VegetableOils |
| Vegetables |
| Miscellaneous |
| Obesity |
| Confirmed |
| Deaths |
| Recovered |
| Active |
| Population |
I chose the variable Eggs. This variable is described as the percentage of protein each country intakes through eggs. There are 170 different observations.
Formula for constructing a confidence interval \[ \bar{x} \pm z\times(s/\sqrt{n}) \]
meggs <- mean(protein$Eggs) # I use the mean function, then list the data set I am using then the variable that I would like the mean of.
sdeggs <- sd(protein$Eggs) # I use the standard deviation function, then list the data set I am using then the variable that I would like the standard deviation of.
1.161468 + 1.96*(0.7872996/sqrt(170)) # We use 1.96 as our z-score because it is a 95% CI.
## [1] 1.279819
1.161468 - 1.96*(0.7872996/sqrt(170)) # We use 1.96 as our z-score because it is a 95% CI.
## [1] 1.043117
# Our 95% confidence interval is [1.043117, 1.279819].
original.sample = sample( protein$Eggs, # population of all WCU students heights
170, # sample size = 170 values in the sample
replace = FALSE) # sample without replacement
### Bootstrap sampling begins
bt.sample.mean.vec = NULL # define an empty vector to hold sample means of repeated samples.
for(i in 1:1000){ # starting for-loop to take bootstrap samples with n = 81
ith.bt.sample = sample( original.sample, # Original sample with 81 WCU students' heights
170, # sample size = 170 MUST be equal to the sample size!!
replace = TRUE # MUST use WITH REPLACEMENT!!
) # this is the i-th Bootstrap sample
bt.sample.mean.vec[i] = mean(ith.bt.sample)} # calculate the mean of i-th bootstrap sample and
# save it in the empty vector: sample.bt.mean.vec
CI = quantile(bt.sample.mean.vec, c(0.025, 0.975))
CI
## 2.5% 97.5%
## 1.050379 1.273215
hist(bt.sample.mean.vec, # data used for histogram
breaks = 18, # specify number of vertical bars
xlab = "Bootstrap sample means", # change the label of x-axis
main="Bootstrap Sampling Distribution \n of Sample Means")
They are nearly the same. Using the bootstrap method, my 95% CI is [1.047590, 1.280972]. Using knowledge from my prior classes, my 95% CI is [1.043117, 1.279819]. They are different by tenths. These confidence intervals tell us that we are 95% confident that the mean amount of protein consumed through eggs in the 170 observations are between 1.04 and 1.28.