1 Introduction and Background:

I am analyzing a data set compiled in 2020 during the height of COVID-19. This dataset focused on the relationship between dietary decisions, population counts, obesity rates and cases of COVID. It surveyed 170 different countries.

library(tidyverse)
Protein_Data = read_csv(file = "Protein_And_Quantity_Data.csv")

2 Variable Selection

For my analysis I will be working with the variable “Meat”. By just quickly glancing at the data set, I can see there are noticable differences between the amount of meat different countries consume. This can be due to differences in countries’ climates, geographic conditions, cultures and religions.

3 Traditional Confidence Interval Creation

Since we are not provided the standard deviation of meat consumption for each of these countries’ populations, I used a simple t test with a 95% confidence interval to calculate that the average meat consumption across the 170 countries is somewhere between 9.19 and 10.61.

Meat = Protein_Data$Meat
invisible(glimpse(Meat))
 num [1:170] 3.13 7.66 3.51 7.62 16.07 ...
Traditional_CI = t.test(Meat, conf.level = 0.95)
Traditional_CI

    One Sample t-test

data:  Meat
t = 27.536, df = 169, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  9.193009 10.612918
sample estimates:
mean of x 
 9.902964 

4 Bootstrap Sampling Confidence Interval Creation

On top of creating a standard confidence interval, I also created one using the bootstrap sampling method. That is, I ran a programming loop to take 1,000 samples of the different amounts of meat the countries in the dataset consumed, using a sample size of 55 each time. I then constructed a 95% confidence interval, using the sample means found by that process. Per the bootstrap sampling method; we can infer that the average meat consumption across the 170 countries is somewhere between 8.29 and 8.69.

invisible(sum(is.na(Meat))) #Shows we have no missing values
invisible(sum(is.nan(Meat))) #Shows we have no invalid values (NaN = not a number)

#Sampling process taken from class notes:
Original.Sample = sample(Meat, 
                       55,   #Sample size = 55 values in the sample
                       replace = FALSE  #Sample without replacement
                 ) 

    Bootstrap_Sample_Vector = NULL # Empty vector will hold samples
    
    for(i in 1:1000){ 
      ith.Bootstrap.Sample = sample(x = Original.Sample, 
            size = 55, #Sample size must remain the same for every sample
            replace = TRUE  #Again, sample with replacement
                           )  
      Bootstrap_Sample_Vector[i] = mean(ith.Bootstrap.Sample) }

Boot_CI = quantile(Bootstrap_Sample_Vector, c(.025, .0975))
Boot_CI
    2.5%    9.75% 
9.260504 9.589189 

5 Plot of the Bootstrap Sampling Distribution

Below is a histogram depicting the distribution of the bootstrap sampling means of meat consumed amongst the countries in the dataset. By looking at the visual, it appears that the distribution of meat consumption is relatively normal, and representative of the quantitative figure that was found via the bootstrap confidence interval calculation.

rng <- range(Bootstrap_Sample_Vector, na.rm = TRUE)
Command_Breaks <- seq(rng[1], rng[2], length.out = 15)
##These lines of code will ensure the same number of breaks in histogram every time the file is ran or knitted.

hist(Bootstrap_Sample_Vector, 
     breaks = Command_Breaks, 
       xlab = "Sample Means of Meat Consumed",

        main="Sampling Distribution of \n Bootstrap Sample Means \n of Global Meat Consumed")  

6 Traditional Method vs Bootstrap Method

Traditional = [9.19, 10.61] Bootstrap = [8.29, 8.69]

Looking at our two confidence intervals, the first thing we notice is that the bootstrap interval is far tighter, only having a difference of 0.4 between its lower and upper bound, as opposed to the difference of 1.42 for the t-test interval. This is expected though, as any distribution of sample means has less variance than a distribution of single data points. Along with that, the confidence interval coming from the bootstrap method also provides a lower estimation for the true mean consumption of meat across the surveyed countries than the confidence interval which comes from the traditional method.

Given that we cannot confirm normality for the distribution of meat consumption across these countries, the bootstrap method should be taken as a more reliable measure of the true average of meat consumption.

---
title: "Analysis of Estimated Meat Intake By Country in 2020"
author: "Chris Bahm"
date: "2025-09-06"
output:  
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: true
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---
```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.
#
knitr::opts_chunk$set(echo = TRUE,            # include code chunk in the output file
                      warning = FALSE,        # sometimes, you code may produce warning messages,
                                              # you can choose to include the warning messages in
                                              # the output file. 
                      results = TRUE,         # you can also decide whether to include the output
                                              # in the output file.
                      message = FALSE,        # suppress messages 
                      comment = NA            # remove the default leading hash tags in the output
                      )   
```

## Introduction and Background:
I am analyzing a data set compiled in 2020 during the height of COVID-19. This dataset focused on the relationship between dietary decisions, population counts, obesity rates and cases of COVID. It surveyed 170 different countries. 

```{r}
library(tidyverse)
Protein_Data = read_csv(file = "Protein_And_Quantity_Data.csv")
```

## Variable Selection
For my analysis I will be working with the variable "Meat". By just quickly glancing at the data set, I can see there are noticable differences between the amount of meat different countries consume. This can be due to differences in countries' climates, geographic conditions, cultures and religions.

## Traditional Confidence Interval Creation
Since we are not provided the standard deviation of meat consumption for each of these countries' populations, I used a simple t test with a 95% confidence interval to calculate that the average meat consumption across the 170 countries is somewhere between 9.19 and 10.61.
```{r}
Meat = Protein_Data$Meat
invisible(glimpse(Meat))

Traditional_CI = t.test(Meat, conf.level = 0.95)
Traditional_CI
```

## Bootstrap Sampling Confidence Interval Creation
On top of creating a standard confidence interval, I also created one using the bootstrap sampling method. That is, I ran a programming loop to take 1,000 samples of the different amounts of meat the countries in the dataset consumed, using a sample size of 55 each time. I then constructed a 95% confidence interval, using the sample means found by that process.
Per the bootstrap sampling method; we can infer that the average meat consumption across the 170 countries is somewhere between 8.29 and 8.69.
```{r}
invisible(sum(is.na(Meat))) #Shows we have no missing values
invisible(sum(is.nan(Meat))) #Shows we have no invalid values (NaN = not a number)

#Sampling process taken from class notes:
Original.Sample = sample(Meat, 
                       55,   #Sample size = 55 values in the sample
                       replace = FALSE  #Sample without replacement
                 ) 

    Bootstrap_Sample_Vector = NULL # Empty vector will hold samples
    
    for(i in 1:1000){ 
      ith.Bootstrap.Sample = sample(x = Original.Sample, 
            size = 55, #Sample size must remain the same for every sample
            replace = TRUE  #Again, sample with replacement
                           )  
      Bootstrap_Sample_Vector[i] = mean(ith.Bootstrap.Sample) }

Boot_CI = quantile(Bootstrap_Sample_Vector, c(.025, .0975))
Boot_CI
```

## Plot of the Bootstrap Sampling Distribution
Below is a histogram depicting the distribution of the bootstrap sampling means of meat consumed amongst the countries in the dataset. By looking at the visual, it appears that the distribution of meat consumption is relatively normal, and representative of the quantitative figure that was found via the bootstrap confidence interval calculation.

```{r}
rng <- range(Bootstrap_Sample_Vector, na.rm = TRUE)
Command_Breaks <- seq(rng[1], rng[2], length.out = 15)
##These lines of code will ensure the same number of breaks in histogram every time the file is ran or knitted.

hist(Bootstrap_Sample_Vector, 
     breaks = Command_Breaks, 
       xlab = "Sample Means of Meat Consumed",

        main="Sampling Distribution of \n Bootstrap Sample Means \n of Global Meat Consumed")  
```

## Traditional Method vs Bootstrap Method
Traditional = [9.19, 10.61]
Bootstrap = [8.29, 8.69]

Looking at our two confidence intervals, the first thing we notice is that the bootstrap interval is far tighter, only having a difference of 0.4 between its lower and upper bound, as opposed to the difference of 1.42 for the t-test interval. This is expected though, as any distribution of sample means has less variance than a distribution of single data points. Along with that, the confidence interval coming from the bootstrap method also provides a lower estimation for the true mean consumption of meat across the surveyed countries than the confidence interval which comes from the traditional method.

Given that we cannot confirm normality for the distribution of meat consumption across these countries, the bootstrap method should be taken as a more reliable measure of the true average of meat consumption.