Assignment #1

library(kableExtra)

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

protein = read.csv("https://raw.githubusercontent.com/pengdsci/sta321/main/ww02/w02-Protein_Supply_Quantity_Data.csv", 
                   header = TRUE)
var.name = names(protein)
kable(data.frame(var.name))

var.name
Country
AlcoholicBeverages
AnimalProducts
Animalfats
CerealsExcludingBeer
Eggs
FishSeafood
FruitsExcludingWine
Meat
MilkExcludingButter
Offals
Oilcrops
Pulses
Spices
StarchyRoots
Stimulants
Treenuts
VegetalProducts
VegetableOils
Vegetables
Miscellaneous
Obesity
Confirmed
Deaths
Recovered
Active
Population

I chose the variable Eggs.

I chose the variable Eggs. This variable is described as the percentage of protein each country intakes through eggs. There are 170 different observations.

Using knowledge from prior classes to construct confidence interval.

Formula for constructing a confidence interval \[ \bar{x} \pm z\times(s/\sqrt{n}) \]

meggs <-  mean(protein$Eggs) # I use the mean function, then list the data set I am using then the variable that I                      would like the mean of.
sdeggs <- sd(protein$Eggs)   # I use the standard deviation function, then list the data set I am using then the                        variable that I would like the standard deviation of.

                   
1.161468 + 1.96*(0.7872996/sqrt(170)) # We use 1.96 as our z-score because it is a 95% CI.

## [1] 1.279819

1.161468 - 1.96*(0.7872996/sqrt(170)) # We use 1.96 as our z-score because it is a 95% CI.

## [1] 1.043117

# Our 95% confidence interval is [1.043117, 1.279819].

Using bootstrap method

original.sample = sample( protein$Eggs,    # population of all WCU students heights
                       170,                      # sample size = 170 values in the sample
                       replace = FALSE)          # sample without replacement

### Bootstrap sampling begins 
bt.sample.mean.vec = NULL             # define an empty vector to hold sample means of repeated samples.
for(i in 1:1000){                             # starting for-loop to take bootstrap samples with n = 81
  ith.bt.sample = sample( original.sample,    # Original sample with 81 WCU students' heights
                       170,                    # sample size = 170 MUST be equal to the sample size!!
                       replace = TRUE         # MUST use WITH REPLACEMENT!!
                 )                            # this is the i-th Bootstrap sample
  bt.sample.mean.vec[i] = mean(ith.bt.sample)} # calculate the mean of i-th bootstrap sample and 
                                              # save it in the empty vector: sample.bt.mean.vec 
  
CI = quantile(bt.sample.mean.vec, c(0.025, 0.975))
CI

##     2.5%    97.5% 
## 1.050379 1.273215

Plotting

hist(bt.sample.mean.vec,                                         # data used for histogram
     breaks = 18,                                                # specify number of vertical bars
     xlab = "Bootstrap sample means",                            # change the label of x-axis
     main="Bootstrap Sampling Distribution \n of Sample Means")

Comparing two intervals

They are nearly the same. Using the bootstrap method, my 95% CI is [1.047590, 1.280972]. Using knowledge from my prior classes, my 95% CI is [1.043117, 1.279819]. They are different by tenths. These confidence intervals tell us that we are 95% confident that the mean amount of protein consumed through eggs in the 170 observations are between 1.04 and 1.28.