Introduction and
Background:
I am analyzing a data set compiled in 2020 during the height of
COVID-19. This dataset focused on the relationship between dietary
decisions, population counts, obesity rates and cases of COVID. It
surveyed 170 different countries.
library(tidyverse)
Protein_Data = read_csv(file = "Protein_And_Quantity_Data.csv")
Variable Selection
For my analysis I will be working with the variable “Meat”. By just
quickly glancing at the data set, I can see there are noticable
differences between the amount of meat different countries consume. This
can be due to differences in countries’ climates, geographic conditions,
cultures and religions.
Traditional Confidence
Interval Creation
Since we are not provided the standard deviation of meat consumption
for each of these countries’ populations, I used a simple t test with a
95% confidence interval to calculate that the average meat consumption
across the 170 countries is somewhere between 9.19 and 10.61.
Meat = Protein_Data$Meat
invisible(glimpse(Meat))
num [1:170] 3.13 7.66 3.51 7.62 16.07 ...
Traditional_CI = t.test(Meat, conf.level = 0.95)
Traditional_CI
One Sample t-test
data: Meat
t = 27.536, df = 169, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
9.193009 10.612918
sample estimates:
mean of x
9.902964
Bootstrap Sampling
Confidence Interval Creation
On top of creating a standard confidence interval, I also created one
using the bootstrap sampling method. That is, I ran a programming loop
to take 1,000 samples of the different amounts of meat the countries in
the dataset consumed, using a sample size of 55 each time. I then
constructed a 95% confidence interval, using the sample means found by
that process. Per the bootstrap sampling method; we can infer that the
average meat consumption across the 170 countries is somewhere between
8.29 and 8.69.
invisible(sum(is.na(Meat))) #Shows we have no missing values
invisible(sum(is.nan(Meat))) #Shows we have no invalid values (NaN = not a number)
#Sampling process taken from class notes:
Original.Sample = sample(Meat,
55, #Sample size = 55 values in the sample
replace = FALSE #Sample without replacement
)
Bootstrap_Sample_Vector = NULL # Empty vector will hold samples
for(i in 1:1000){
ith.Bootstrap.Sample = sample(x = Original.Sample,
size = 55, #Sample size must remain the same for every sample
replace = TRUE #Again, sample with replacement
)
Bootstrap_Sample_Vector[i] = mean(ith.Bootstrap.Sample) }
Boot_CI = quantile(Bootstrap_Sample_Vector, c(.025, .0975))
Boot_CI
2.5% 9.75%
9.260504 9.589189
Plot of the Bootstrap
Sampling Distribution
Below is a histogram depicting the distribution of the bootstrap
sampling means of meat consumed amongst the countries in the dataset. By
looking at the visual, it appears that the distribution of meat
consumption is relatively normal, and representative of the quantitative
figure that was found via the bootstrap confidence interval
calculation.
rng <- range(Bootstrap_Sample_Vector, na.rm = TRUE)
Command_Breaks <- seq(rng[1], rng[2], length.out = 15)
##These lines of code will ensure the same number of breaks in histogram every time the file is ran or knitted.
hist(Bootstrap_Sample_Vector,
breaks = Command_Breaks,
xlab = "Sample Means of Meat Consumed",
main="Sampling Distribution of \n Bootstrap Sample Means \n of Global Meat Consumed")

Traditional Method vs
Bootstrap Method
Traditional = [9.19, 10.61] Bootstrap = [8.29, 8.69]
Looking at our two confidence intervals, the first thing we notice is
that the bootstrap interval is far tighter, only having a difference of
0.4 between its lower and upper bound, as opposed to the difference of
1.42 for the t-test interval. This is expected though, as any
distribution of sample means has less variance than a distribution of
single data points. Along with that, the confidence interval coming from
the bootstrap method also provides a lower estimation for the true mean
consumption of meat across the surveyed countries than the confidence
interval which comes from the traditional method.
Given that we cannot confirm normality for the distribution of meat
consumption across these countries, the bootstrap method should be taken
as a more reliable measure of the true average of meat consumption.
