1 Introduction and Background:

I am analyzing a data set compiled in 2020 during the height of COVID-19. This dataset focused on the relationship between dietary decisions, population counts, obesity rates and cases of COVID. It surveyed 170 different countries.

library(tidyverse)
Protein_Data = read_csv(file = "Protein_And_Quantity_Data.csv")

2 Variable Selection

For my analysis I will be working with the variable “Meat”. By just quickly glancing at the data set, I can see there are noticable differences between the amount of meat different countries consume. This can be due to differences in countries’ climates, geographic conditions, cultures and religions.

3 Traditional Confidence Interval Creation

Since we are not provided the standard deviation of meat consumption for each of these countries’ populations, I used a simple t test with a 95% confidence interval to calculate that the average meat consumption across the 170 countries is somewhere between 9.19 and 10.61.

Meat = Protein_Data$Meat
invisible(glimpse(Meat))
 num [1:170] 3.13 7.66 3.51 7.62 16.07 ...
Traditional_CI = t.test(Meat, conf.level = 0.95)
Traditional_CI

    One Sample t-test

data:  Meat
t = 27.536, df = 169, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  9.193009 10.612918
sample estimates:
mean of x 
 9.902964 

4 Bootstrap Sampling Confidence Interval Creation

On top of creating a standard confidence interval, I also created one using the bootstrap sampling method. That is, I ran a programming loop to take 1,000 samples of the different amounts of meat the countries in the dataset consumed, using a sample size of 55 each time. I then constructed a 95% confidence interval, using the sample means found by that process. Per the bootstrap sampling method; we can infer that the average meat consumption across the 170 countries is somewhere between 8.29 and 8.69.

invisible(sum(is.na(Meat))) #Shows we have no missing values
invisible(sum(is.nan(Meat))) #Shows we have no invalid values (NaN = not a number)

#Sampling process taken from class notes:
Original.Sample = sample(Meat, 
                       55,   #Sample size = 55 values in the sample
                       replace = FALSE  #Sample without replacement
                 ) 

    Bootstrap_Sample_Vector = NULL # Empty vector will hold samples
    
    for(i in 1:1000){ 
      ith.Bootstrap.Sample = sample(x = Original.Sample, 
            size = 55, #Sample size must remain the same for every sample
            replace = TRUE  #Again, sample with replacement
                           )  
      Bootstrap_Sample_Vector[i] = mean(ith.Bootstrap.Sample) }

Boot_CI = quantile(Bootstrap_Sample_Vector, c(.025, .0975))
Boot_CI
    2.5%    9.75% 
9.260504 9.589189 

5 Plot of the Bootstrap Sampling Distribution

Below is a histogram depicting the distribution of the bootstrap sampling means of meat consumed amongst the countries in the dataset. By looking at the visual, it appears that the distribution of meat consumption is relatively normal, and representative of the quantitative figure that was found via the bootstrap confidence interval calculation.

rng <- range(Bootstrap_Sample_Vector, na.rm = TRUE)
Command_Breaks <- seq(rng[1], rng[2], length.out = 15)
##These lines of code will ensure the same number of breaks in histogram every time the file is ran or knitted.

hist(Bootstrap_Sample_Vector, 
     breaks = Command_Breaks, 
       xlab = "Sample Means of Meat Consumed",

        main="Sampling Distribution of \n Bootstrap Sample Means \n of Global Meat Consumed")  

6 Traditional Method vs Bootstrap Method

Traditional = [9.19, 10.61] Bootstrap = [8.29, 8.69]

Looking at our two confidence intervals, the first thing we notice is that the bootstrap interval is far tighter, only having a difference of 0.4 between its lower and upper bound, as opposed to the difference of 1.42 for the t-test interval. This is expected though, as any distribution of sample means has less variance than a distribution of single data points. Along with that, the confidence interval coming from the bootstrap method also provides a lower estimation for the true mean consumption of meat across the surveyed countries than the confidence interval which comes from the traditional method.

Given that we cannot confirm normality for the distribution of meat consumption across these countries, the bootstrap method should be taken as a more reliable measure of the true average of meat consumption.

