setwd("C:/Users/avaan/OneDrive/Desktop")
protein <- read.csv(file = "protein.csv")
str(protein)
## 'data.frame': 170 obs. of 32 variables:
## $ Country : chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ Alcoholic.Beverages : num 0 0.184 0.0323 0.6285 0.1535 ...
## $ Animal.Products : num 9.75 27.75 13.84 15.23 33.19 ...
## $ Animal.fats : num 0.0277 0.0711 0.0054 0.0277 0.1289 ...
## $ Aquatic.Products..Other : num 0 0 0 0 0 0 0 0.0046 0 0 ...
## $ Cereals...Excluding.Beer : num 36 14.2 26.6 20.4 10.5 ...
## $ Eggs : num 0.407 1.807 1.292 0.176 0.485 ...
## $ Fish..Seafood : num 0.0647 0.6274 0.635 5.4436 8.2146 ...
## $ Fruits...Excluding.Wine : num 0.582 1.276 1.162 1.275 1.259 ...
## $ Meat : num 3.13 7.66 3.51 7.62 16.07 ...
## $ Milk...Excluding.Butter : num 5.53 16.48 8.06 1.15 7.43 ...
## $ Offals : num 0.592 1.108 0.328 0.813 0.853 ...
## $ Oilcrops : num 0.203 0.372 0.183 2.153 0.767 ...
## $ Pulses : num 1.248 1.456 2.551 4.085 0.884 ...
## $ Spices : num 0.166 0 0.178 0 0.344 ...
## $ Starchy.Roots : num 0.194 0.887 1.464 5.194 0.467 ...
## $ Stimulants : num 0.555 0.264 0.463 0.102 0.411 ...
## $ Sugar.Crops : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Sugar...Sweeteners : num 0 0.0042 0 0.0092 0 0.0049 0.0051 0 0.0139 0 ...
## $ Treenuts : num 0.1387 0.2677 0.2745 0.0092 0.0737 ...
## $ Vegetal.Products : num 40.2 22.3 36.2 34.8 16.8 ...
## $ Vegetable.Oils : num 0 0.0084 0.0269 0.0092 0.043 0 0.0205 0.0463 0.0647 0.0217 ...
## $ Vegetables : num 1.137 3.246 3.127 0.813 1.602 ...
## $ Miscellaneous : num 0.0462 0.0544 0.1399 0.0924 0.2947 ...
## $ Obesity : num 4.5 22.3 26.6 6.8 19.1 28.5 20.9 30.4 21.9 19.9 ...
## $ Undernourished : chr "29.8" "6.2" "3.9" "25" ...
## $ Confirmed : num 0.1421 2.9673 0.2449 0.0617 0.2939 ...
## $ Deaths : num 0.00619 0.05095 0.00656 0.00146 0.00714 ...
## $ Recovered : num 0.1234 1.7926 0.1676 0.0568 0.1908 ...
## $ Active : num 0.01257 1.12371 0.07077 0.00342 0.09592 ...
## $ Population : num 38928000 2838000 44357000 32522000 98000 ...
## $ Unit..all.except.Population.: chr "%" "%" "%" "%" ...
library(MASS)
The variable that I am going to use in my analysis is called ‘Animal.Products’. This variable describes the percentage of fat intake from each country that comes from animal products.
result <- t.test(protein$Animal.Products)
print(result$conf.int)
## [1] 20.03275 22.43156
## attr(,"conf.level")
## [1] 0.95
I used the t-test method to construct a confidence interval which was used in previous statistics courses. It gave a 95% CI [20.03, 22.43].
n = dim(protein)[1]
original.sample = sample(protein$Animal.Products, 170, replace = FALSE)
bt.sample.mean.vec = NULL
for(i in 1:1000){ith.bt.sample = sample(x = original.sample,
size = 170,
replace = TRUE)
bt.sample.mean.vec[i] = mean(ith.bt.sample) }
CI = quantile(bt.sample.mean.vec, c(0.025, 0.975))
CI
## 2.5% 97.5%
## 20.13919 22.45177
hist(bt.sample.mean.vec,
breaks = 20, # specify number of vertical bars
xlab = "Bootstrap sample means", # change the label of x-axis
# add a title to the histogram
main="Bootstrap Sampling Distribution \n of Sample Means")
Bootstrap sampling distribution of sample means
The confidence intervals for the t-test and the bootstrap model are extremely similar. they are only off from one another by small decimals. This is likely because both bootstrap and t-tests use sample data to evaluate the mean. Both confidence intervals also account for uncertainty, the t-test relies on a normal distribution, and the bootstrap model uses re sampling.