##Do Covid-19 deaths differ among different vaccination rates in different countries?
I want to do my project on a dataset on the COVID pandemic. The variables are 1. country (categorical), 2. date (year-month-day), 3. total_vaccinations (int), 4. people_vaccinated (int (float)), 5. people_fully_vaccinated (int (float)), 6. New_deaths (int), 7. population (int), 8. ratio (float).
I found this data a while ago on kaggle. It had around 900 observations.
The question I want to possibly answer is if there is a relationship between the number of vaccinations and the outcome of COVID deaths in countries. I also want to see if I can compare the vaccinations with fully vaccinated, and maybe if there’s a better relationship among the two. Let’s do summary statistics and graphical/numerical summaries. I would explore the types of statistical models to see if we can predict the amount of deaths with vaccinations in a country.
#Load necessary libraries
library(ggplot2)
library(dplyr)
# Load the data
data <- read.csv("~/MATH_248_Modeling/covid-vaccination-vs-death_ratio.csv")
#printing out the heading of data
head(data)
##Summary statistics for all groups.
_We can see below that the means are large and that there might need to be some normalization of the New_deaths column. I can’t really tell what else these numbers might tell, but it seems that most of the distributions are pretty normal. Both of these summaries are the same, just different format/way to look at it. We also need to consider multi collinearity, since we have total_vaccinations, people_vaccinated, people_fully_vaccination that might be inclusive of each other.
#Summary of dataframe for interquartile range.
summary(data)
## X country iso_code date
## Min. : 0 Length:32911 Length:32911 Length:32911
## 1st Qu.: 8228 Class :character Class :character Class :character
## Median :16455 Mode :character Mode :character Mode :character
## Mean :16455
## 3rd Qu.:24682
## Max. :32910
## total_vaccinations people_vaccinated people_fully_vaccinated
## Min. :1.000e+00 Min. :1.000e+00 Min. :1.000e+00
## 1st Qu.:7.289e+05 1st Qu.:4.571e+05 1st Qu.:2.314e+05
## Median :4.552e+06 Median :2.725e+06 Median :1.694e+06
## Mean :3.689e+07 Mean :2.036e+07 Mean :1.471e+07
## 3rd Qu.:2.065e+07 3rd Qu.:1.158e+07 3rd Qu.:7.907e+06
## Max. :3.244e+09 Max. :1.276e+09 Max. :1.241e+09
## New_deaths population ratio
## Min. :-2440 Min. :1.373e+03 Min. : 0.00001
## 1st Qu.: 1 1st Qu.:2.962e+06 1st Qu.: 13.69757
## Median : 9 Median :1.034e+07 Median : 41.53047
## Mean : 100 Mean :5.164e+07 Mean : 41.97129
## 3rd Qu.: 49 3rd Qu.:3.826e+07 3rd Qu.: 68.14733
## Max. :11447 Max. :1.447e+09 Max. :124.73737
#Summary Statistics for Numeric Variables
summary_stats <- data %>%
summarise(
Total_Vaccinations = summary(total_vaccinations),
People_Vaccinated = summary(people_vaccinated),
People_Fully_Vaccinated = summary(people_fully_vaccinated),
New_Deaths = summary(New_deaths),
Population = summary(population),
Ratio = summary(ratio)
)
print(summary_stats)
## Total_Vaccinations People_Vaccinated People_Fully_Vaccinated New_Deaths
## 1 1 1 1 -2440.0000
## 2 728918 457109 231350 1.0000
## 3 4552479 2724528 1693674 9.0000
## 4 36894889 20358462 14710135 100.0431
## 5 20652014 11582032 7907290 49.0000
## 6 3243599000 1275541000 1240777000 11447.0000
## Population Ratio
## 1 1373 1.142522e-05
## 2 2962425 1.369757e+01
## 3 10340571 4.153047e+01
## 4 51640598 4.197129e+01
## 5 38261228 6.814733e+01
## 6 1447065329 1.247374e+02
##Plots
We can see that the total vaccinations might need to be normalized, but the amount seems to be around 30,000. and skewed to the left
df <- data
# Load necessary libraries
library(ggplot2)
# Assuming your dataset is already loaded in a variable called df
# Convert 'date' to Date type
df$date <- as.Date(df$date)
# Scatterplot: New Deaths vs Total Vaccinations
deaths_vs_vac <- ggplot(df, aes(x = people_vaccinated, y = New_deaths)) +
geom_point(color = 'blue') +
labs(title = "New Deaths vs People Vaccinated", x = "People Vaccinated", y = "New Deaths") +
theme_minimal()
# Scatterplot: New Deaths vs Ratio
# See if New Deaths impact the death to population ratio at all
deaths_vs_deathsratio <- ggplot(df, aes(x = ratio, y = New_deaths)) +
geom_point(color = 'green') +
labs(title = "New Deaths vs Deaths-to-Population Ratio", x = "Deaths-to-Population Ratio", y = "New Deaths") +
theme_minimal()
# Scatterplot: People Fully Vaccinated vs Ratio
fullyvac_vs_ratio <- ggplot(df, aes(x = people_fully_vaccinated, y = ratio)) +
geom_point(color = 'purple') +
labs(title = "People Fully Vaccinated vs Deaths-to-Population Ratio", x = "People Fully Vaccinated", y = "Deaths-to-Population Ratio") +
theme_minimal()
In the graphs below, we’re exploring the relationship among the several variables that include ratios instead of numbers. The blue graph shows that the people vaccinated is a significant predictor for a decrease in new deaths. New Deaths is also not a huge factor in the death ratio, and the full vaccinated numbers seem to not be related to the deaths-to-population ratio.
deaths_vs_vac
deaths_vs_deathsratio
fullyvac_vs_ratio
#This saves the files as png for the slides.
ggsave(deaths_vs_vac,
filename = "deaths_vs_vac.png",
device = "png",
height = 6, width = 5, units = "in")
ggsave(deaths_vs_deathsratio,
filename = "deaths_vs_deathsratio.png",
device = "png",
height = 6, width = 5, units = "in")
ggsave(fullyvac_vs_ratio,
filename = "fullyvac_vs_ratio.png",
device = "png",
height = 6, width = 5, units = "in")
The histogram below doesn’t tell us a lot about the frequency of total vaccinations.
#Histogram for Total Vaccinations
ggplot(data, aes(x = total_vaccinations)) +
geom_histogram(bins = 10, fill = "green", alpha = 0.8) +
labs(title = "Distribution of Total Vaccinations", x = "Total Vaccinations", y = "Frequency")
This variable is also similar to above
#Histogram for Total Vaccinations
ggplot(data, aes(x = people_fully_vaccinated)) +
geom_histogram(bins = 10, fill = "pink", alpha = 0.8) +
labs(title = "Distribution of people fully vaccinationed", x = "fully Vaccinations", y = "Frequency")
The New_deaths column also seems to be similar to the above plot for total vaccinations.
# Histogram for New Deaths
ggplot(data, aes(x = New_deaths)) +
geom_histogram(bins = 10, fill = "grey", alpha = 0.8) +
labs(title = "Distribution of New Deaths", x = "New Deaths", y = "Frequency")
We can see here that there seems to be a negative moderate correlation between the total number of deaths and total vaccinations. There is a weird shape in the beginning, and I wonder why that is. But this tells me that there is likely information about when there is an increase in vaccinations, there might be a decrease in deaths overall.
# Scatter Plot of Total Vaccinations vs. New Deaths
ggplot(data, aes(x = total_vaccinations, y = New_deaths)) +
geom_point(alpha = 0.5) +
labs(title = "Relationship between Total Vaccinations and New Deaths", x = "Total Vaccinations", y = "New Deaths")
We can also see below that the graphs are very similar, but this is actually forming definitive spikes it seems.
# Scatter Plot of Total Vaccinations vs. New Deaths
ggplot(data, aes(x = people_fully_vaccinated, y = New_deaths)) +
geom_point(alpha = 0.5) +
labs(title = "Relationship between fully Vaccinations and New Deaths", x = "fully Vaccinations", y = "New Deaths")
#Methodology
Goals is to look at: different starting points in different countries for the mixed models or random effects model.
Compare the theoretical models, representing the starting point of when the vaccination rate is 0, and how each country’s response value for death rate is if there’s no difference in slope.
Theoretical model for the difference in intercept. Y is the response variable, \(beta_0\) is the intercept, \(b_i\) is if the random effect is added to the intercept.
\[Y=\beta_0+b_i+\beta_1X+\epsilon_{ij}\] \[ \begin{aligned} 1. & \ \beta_0 : \text{Global Intercept} \\ 2. & \ \beta_1 : \text{Fixed effect of vaccinations (global effect)} \\ 3. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 4. & \ \epsilon_{ij} : \text{Error term} \end{aligned} \] This is the more complex theoretical model, where b is the random effects. We later find this model.
\[ Y_{ij} = \beta_0 + b_i + \beta_1 (X) + \beta_2 (X)^2 + \epsilon_{ij} \]
\[ \begin{aligned} 1. & \ \beta_0 : \text{Global Intercept (baseline level of new deaths)} \\ 2. & \ \beta_1 : \text{Linear fixed effect of vaccinations (first polynomial term)} \\ 3. & \ \beta_2 : \text{Quadratic fixed effect of vaccinations (second polynomial term)} \\ 4. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 5. & \ \epsilon_{ij} : \text{Residual error term (captures unexplained variance)} \end{aligned} \]
# Install and load the lme4 package for mixed-effects models
#install.packages("lme4")
library(lme4)
library(ggplot2)
library(dplyr)
Here’s the original model I made. The t values are high, but we want a model to explain the most variance for mixed effects.
# Fit the mixed model with country as a random effect
model <- lmer(New_deaths ~ total_vaccinations + (1 | country), data = df)
summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ total_vaccinations + (1 | country)
## Data: df
##
## REML criterion at convergence: 453265.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -15.667 -0.069 -0.008 0.019 48.764
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 74046 272.1
## Residual 54447 233.3
## Number of obs: 32911, groups: country, 197
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.757e+01 1.959e+01 3.96
## total_vaccinations -8.982e-07 1.633e-08 -55.02
##
## Correlation of Fixed Effects:
## (Intr)
## ttl_vccntns -0.027
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
ranef(model)
## $country
## (Intercept)
## Afghanistan -51.711760
## Albania -72.343434
## Algeria -57.282718
## Andorra -75.608377
## Angola -66.241259
## Anguilla -76.012762
## Antigua and Barbuda -76.618116
## Argentina 135.665235
## Armenia -62.610439
## Aruba -76.868818
## Australia -42.728512
## Austria -44.277649
## Azerbaijan -54.925948
## Bahamas -73.748027
## Bahrain -72.603251
## Bangladesh 40.308876
## Barbados -76.300270
## Belarus -59.471281
## Belgium -41.439215
## Belize -75.913716
## Benin -74.012928
## Bermuda -75.881133
## Bhutan -75.222994
## Bolivia (Plurinational State of) -46.330377
## Bosnia and Herzegovina -53.691151
## Botswana -73.990036
## Brazil 1152.472085
## Brunei Darussalam -76.601000
## Bulgaria -26.900645
## Burkina Faso -71.449194
## Burundi -75.159635
## Cabo Verde -75.497073
## Cambodia -53.309941
## Cameroon -73.668573
## Canada 9.192486
## Cayman Islands -76.881132
## Central African Republic -73.854866
## Chad -75.844214
## Chile 36.136190
## China 2461.684661
## Colombia 198.306085
## Comoros -75.076830
## Cook Islands -74.657175
## Costa Rica -60.705991
## Croatia -49.702110
## Cuba -32.530071
## Curaçao -76.435866
## Cyprus -74.507648
## Denmark -61.757094
## Djibouti -74.170544
## Dominica -76.489450
## Dominican Republic -64.775608
## Ecuador -14.021312
## Egypt -20.254222
## El Salvador -65.511568
## Equatorial Guinea -76.345116
## Estonia -71.592664
## Ethiopia -45.067740
## Falkland Islands (Malvinas) -56.712775
## Fiji -72.989245
## Finland -61.727571
## France 150.885932
## French Polynesia -74.382753
## Gabon -75.449418
## Gambia -75.141313
## Georgia -39.257180
## Germany 192.780661
## Ghana -68.941284
## Gibraltar -77.131923
## Greece -17.787935
## Greenland -77.065876
## Grenada -76.031638
## Guatemala -42.774994
## Guinea -74.736361
## Guinea-Bissau -75.627571
## Guyana -73.860389
## Haiti -75.020319
## Honduras -48.621745
## Hungary 66.962563
## Iceland -76.735358
## India 1527.795127
## Indonesia 358.839097
## Iran (Islamic Republic of) 189.219996
## Iraq -47.587140
## Ireland -62.330662
## Israel -50.388116
## Italy 167.889416
## Jamaica -70.114601
## Japan 67.388809
## Jordan -54.074692
## Kazakhstan -25.631336
## Kenya -62.320059
## Kiribati -73.349323
## Kuwait -68.232868
## Kyrgyzstan -72.548903
## Lao People's Democratic Republic -73.411971
## Latvia -65.254735
## Lebanon -61.875831
## Lesotho -71.735019
## Liberia -72.862327
## Libya -65.829182
## Liechtenstein -77.328615
## Lithuania -60.449098
## Luxembourg -74.270345
## Madagascar -70.228494
## Malawi -71.707655
## Malaysia 36.959469
## Maldives -76.038157
## Mali -74.471552
## Malta -75.886083
## Mauritania -74.023284
## Mauritius -71.558666
## Mexico 383.173324
## Monaco -75.189768
## Mongolia -67.574636
## Montenegro -73.028706
## Montserrat -75.874230
## Morocco -34.481350
## Mozambique -64.350335
## Myanmar -35.105402
## Namibia -56.704276
## Nauru -70.182836
## Nepal -37.019282
## Netherlands -38.154198
## New Caledonia -75.655608
## New Zealand -72.739130
## Nicaragua -70.607872
## Niger -74.257434
## Nigeria -63.377850
## Niue -65.520626
## North Macedonia -64.899874
## Norway -67.652619
## occupied Palestinian territory, including east Jerusalem -61.765382
## Oman -67.562188
## Pakistan 56.780089
## Panama -66.145330
## Papua New Guinea -72.719741
## Paraguay -15.036270
## Peru 200.033919
## Philippines 44.102525
## Poland 131.602421
## Portugal -35.681296
## Qatar -73.044580
## Republic of Korea 6.677866
## Republic of Moldova -68.274826
## Romania 3.643146
## Russian Federation 701.080959
## Rwanda -67.112731
## Saint Kitts and Nevis -76.424135
## Saint Lucia -76.233086
## Samoa -75.428316
## San Marino -76.313250
## Sao Tome and Principe -75.304497
## Saudi Arabia -33.919507
## Senegal -70.385721
## Serbia -43.944774
## Seychelles -76.283338
## Sierra Leone -75.093503
## Singapore -68.394975
## Slovakia -40.606123
## Slovenia -66.225166
## Solomon Islands -74.387059
## Somalia -72.955364
## South Africa 47.618801
## South Sudan -75.738957
## Spain 73.318383
## Sri Lanka -11.202783
## Sudan -67.953997
## Suriname -73.053907
## Sweden -49.054956
## Switzerland -56.341844
## Syrian Arab Republic -68.295153
## Tajikistan -73.100529
## Thailand 40.912852
## The United Kingdom 188.790469
## Togo -73.672310
## Tokelau -44.698453
## Tonga -74.732615
## Trinidad and Tobago -65.680129
## Tunisia -21.279664
## Turkmenistan -58.639803
## Turks and Caicos Islands -74.724526
## Tuvalu -65.514633
## Uganda -66.471379
## Ukraine 156.040678
## United Arab Emirates -58.003635
## United Republic of Tanzania -69.318950
## United States of America 1639.929123
## Uruguay -55.465882
## Uzbekistan -54.756270
## Vanuatu -74.722878
## Venezuela (Bolivarian Republic of) -46.074740
## Viet Nam 99.188835
## Wallis and Futuna -75.859167
## Yemen -70.110040
## Zambia -53.614485
## Zimbabwe -63.575387
##
## with conditional variances for "country"
# Plot the fixed effect of vaccinations on deaths
ggplot(df, aes(x = total_vaccinations, y = New_deaths)) +
geom_point() +
geom_smooth(method = "lm", color = "red") +
labs(title = "Relationship between Vaccinations and New Deaths", x = "Total Vaccinations", y = "New Deaths") +
theme_minimal()
We can see above that there is a slight positive correlation between
total vax and deaths… however, it’s very weak and we can see some
outliers.
Let’s make 8 different models to compare.
##Model Assumptions:
Normality: Random effects are normally distributed. Independence: Clusters such as countries are independent. Linearity: Fixed effects (vaccinations) have a linear relationship with the outcome. Homoscedasticity: Residual variance is constant across predictors. Random Effects Independence: Random effects are uncorrelated with residual errors.
##Why do we use mixed models?
Limitations of Linear Regression: Assumes all observations are independent. Ignores clustering within countries. Fails to account for variability between groups. Advantages of Mixed Models: Models nested data (time points within countries). Captures variability across clusters (country-specific factors that aren’t considered). Provides unbiased estimates and corrects standard errors for group-level differences.
# Model 1: Total Vaccinations as the only predictor
model1 <- lmer(New_deaths ~ total_vaccinations + (1 | country), data = df)
# Model 2: Include People Vaccinated and Fully Vaccinated
model2 <- lmer(New_deaths ~ total_vaccinations + people_vaccinated + people_fully_vaccinated + (1 | country), data = df)
# Model 3: Include Population Size
model3 <- lmer(New_deaths ~ total_vaccinations + population + (1 | country), data = df)
# Model 4: Deaths-to-Population Ratio
model4 <- lmer(ratio ~ total_vaccinations + (1 | country), data = df)
# Model 5: Include Time (Date) as a predictor
model5 <- lmer(New_deaths ~ total_vaccinations + as.numeric(as.Date(date)) + (1 | country), data = df)
# Model 6: Interaction between Vaccinations and Population
model6 <- lmer(New_deaths ~ total_vaccinations * population + (1 | country), data = df)
# Model 7: Deaths per 100,000 People
df$deaths_per_100k <- (df$New_deaths / df$population) * 100000
model7 <- lmer(deaths_per_100k ~ total_vaccinations + (1 | country), data = df)
# Model 9: Non-linear effect of total vaccinations (quadratic term)
model9 <- lmer(New_deaths ~ poly(total_vaccinations, 2) + (1 | country), data = df)
Next, we use the models we made above to compare the BIC and AICs.
# Compare models using AIC and BIC
AIC(model1, model2, model3, model4, model5, model6, model7, model9)
BIC(model1, model2, model3, model4, model5, model6, model7, model9)
# Print the AIC and BIC values for each model
AIC(model1)
## [1] 453273.3
AIC(model2)
## [1] 452696.3
AIC(model3)
## [1] 453038.8
AIC(model4) #this one
## [1] 296527.5
AIC(model5)
## [1] 453273.9
AIC(model6)
## [1] 452531.6
AIC(model7) #this one
## [1] 61457.53
AIC(model9)
## [1] 452274.2
BIC(model1)
## [1] 453306.9
BIC(model2)
## [1] 452746.8
BIC(model3)
## [1] 453080.8
BIC(model4)
## [1] 296561.1
BIC(model5)
## [1] 453315.9
BIC(model6)
## [1] 452582
BIC(model7)
## [1] 61491.14
BIC(model9)
## [1] 452316.2
We can see that model 9 has a moderate BIC/AIC but it’s not overfitted like some other models that have a super low BIC/AIC (not to scale).
The t values we have here are very significant, showing that each coefficient we calculated is statistically significant.
# Get the summary of the model
summary(model9)
## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ poly(total_vaccinations, 2) + (1 | country)
## Data: df
##
## REML criterion at convergence: 452264.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -15.417 -0.073 -0.008 0.028 49.506
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 53243 230.7
## Residual 53010 230.2
## Number of obs: 32911, groups: country, 197
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 24.99 16.67 1.50
## poly(total_vaccinations, 2)1 -22370.41 410.52 -54.49
## poly(total_vaccinations, 2)2 13224.68 426.40 31.02
##
## Correlation of Fixed Effects:
## (Intr) p(_,2)1
## ply(tt_,2)1 0.004
## ply(tt_,2)2 -0.038 0.027
#Results
Here are the residual plots to access model 9. Below, we can see that the cooks distance is mostly even except for outliers, the qqplot shows low normality and the residuals vs. fitted plot lacks consistency. Our model9 is not looking too good for explaining the variance!
# Residuals vs Fitted values plot
plot(fitted(model9), resid(model9), main="Residuals vs Fitted", xlab="Fitted values", ylab="Residuals")
abline(h = 0, col = "red") # add a horizontal line at 0
# Q-Q plot for normality of residuals
qqnorm(resid(model9))
qqline(resid(model9), col = "red") # add a Q-Q line
# Cook's distance plot to identify influential points
plot(cooks.distance(model9), main="Cook's Distance")
#to calc the adj r squared
library(performance)
r2_value <- r2(model9)
print(r2_value)
## # R2 for Mixed Models
##
## Conditional R2: 0.582
## Marginal R2: 0.162
The adjusted R^2 above shows that a small amount of the variance is explained by the fixed and random effects. The only thing about the values is that the condition R^2 is the only significant value, but it only explains 58% of the variance. Here’s a breakdown on each value:
Marginal R² (0.162): Represents the proportion of variance explained by the fixed effects alone (in this case, the quadratic effect of total vaccinations). A value of 0.162 means that the fixed effects explain about 16.2% of the variation in new deaths.
Conditional R² (0.582): Represents the proportion of variance explained by both the fixed effects and the random effects (country-level variation). A value of 0.582 means that the model including country-level variation, explains about 58.2% of the variation in new deaths.
summary(model9)$coefficients
## Estimate Std. Error t value
## (Intercept) 24.99098 16.66604 1.499515
## poly(total_vaccinations, 2)1 -22370.41240 410.52007 -54.492859
## poly(total_vaccinations, 2)2 13224.68204 426.39946 31.014772
summary(model9)
## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ poly(total_vaccinations, 2) + (1 | country)
## Data: df
##
## REML criterion at convergence: 452264.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -15.417 -0.073 -0.008 0.028 49.506
##
## Random effects:
## Groups Name Variance Std.Dev.
## country (Intercept) 53243 230.7
## Residual 53010 230.2
## Number of obs: 32911, groups: country, 197
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 24.99 16.67 1.50
## poly(total_vaccinations, 2)1 -22370.41 410.52 -54.49
## poly(total_vaccinations, 2)2 13224.68 426.40 31.02
##
## Correlation of Fixed Effects:
## (Intr) p(_,2)1
## ply(tt_,2)1 0.004
## ply(tt_,2)2 -0.038 0.027
The summary above outputs these coefficients:
\[ \text{New Deaths} = 24.99 + b_i - 22370.41 \cdot (\text{Vaccinations}) + 13224.68 \cdot (\text{Vaccinations})^2 + \epsilon_{ij} \]
\[ \begin{aligned} 1. & \ 24.99 : \text{Global intercept (baseline level of new deaths)} \\ 2. & \ -22370.41 : \text{Linear fixed effect of vaccinations (first polynomial term)} \\ 3. & \ 13224.68 : \text{Quadratic fixed effect of vaccinations (second polynomial term)} \\ 4. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 5. & \ \epsilon_{ij} : \text{Residual error term (captures unexplained variance)} \end{aligned} \] #Conclusion.
I found out that the best model to explain COVID-19 mortality among different countries using vaccine data is a mixed affects models with polynomial regression, and the random effect being the intercept. We have many countries to consider in this analysis, so this model helps us generalize a little bit of how to look at vaccine data relating to mortality.
My model is not perfect. Compared to the 7 other models I made, it was good considering the AIC/BIC. However, I learned that the AIC and BIC doesn’t punished unscaled models, which could have put me into a danger zone for choosing an overfitted model (ratio vs. full data numbers). I decided to use the new_deaths and total vaccinations, and disregarded the ratios for clarity. The adjusted R^2 was pretty bad for both fixed and random effects. Fixed effects was the worst, but because we wanted a mixed effects model, the random effects explained 58% of the variability within the data, which is below average! I guess it’s okay considering the amount of countries I was using.
Next time, I would like to control the types of countries I include and just do a few.
So do vaccination rates influence COVID-19 deaths… maybe, maybe not, our model with multiple countries as our random effect shows that we do have to control for countries. We should consider each countries’ policies and economic factors, and look at vaccination rates on a case by case basis. Perhaps it would be better to use different predictors.