Project Housing Analysis

##Do Covid-19 deaths differ among different vaccination rates in different countries?

I want to do my project on a dataset on the COVID pandemic. The variables are 1. country (categorical), 2. date (year-month-day), 3. total_vaccinations (int), 4. people_vaccinated (int (float)), 5. people_fully_vaccinated (int (float)), 6. New_deaths (int), 7. population (int), 8. ratio (float).

I found this data a while ago on kaggle. It had around 900 observations.

The question I want to possibly answer is if there is a relationship between the number of vaccinations and the outcome of COVID deaths in countries. I also want to see if I can compare the vaccinations with fully vaccinated, and maybe if there’s a better relationship among the two. Let’s do summary statistics and graphical/numerical summaries. I would explore the types of statistical models to see if we can predict the amount of deaths with vaccinations in a country.

country (categorical),
date (year-month-day),
total_vaccinations (int),
people_vaccinated (int (float)),
people_fully_vaccinated (int (float)),
New_deaths (int),
population (int),
ratio (float)

#Load necessary libraries
library(ggplot2)
library(dplyr)

# Load the data
data <- read.csv("~/MATH_248_Modeling/covid-vaccination-vs-death_ratio.csv")

#printing out the heading of data
head(data)

##Summary statistics for all groups.

_We can see below that the means are large and that there might need to be some normalization of the New_deaths column. I can’t really tell what else these numbers might tell, but it seems that most of the distributions are pretty normal. Both of these summaries are the same, just different format/way to look at it. We also need to consider multi collinearity, since we have total_vaccinations, people_vaccinated, people_fully_vaccination that might be inclusive of each other.

#Summary of dataframe for interquartile range. 
summary(data)

##        X           country            iso_code             date          
##  Min.   :    0   Length:32911       Length:32911       Length:32911      
##  1st Qu.: 8228   Class :character   Class :character   Class :character  
##  Median :16455   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :16455                                                           
##  3rd Qu.:24682                                                           
##  Max.   :32910                                                           
##  total_vaccinations  people_vaccinated   people_fully_vaccinated
##  Min.   :1.000e+00   Min.   :1.000e+00   Min.   :1.000e+00      
##  1st Qu.:7.289e+05   1st Qu.:4.571e+05   1st Qu.:2.314e+05      
##  Median :4.552e+06   Median :2.725e+06   Median :1.694e+06      
##  Mean   :3.689e+07   Mean   :2.036e+07   Mean   :1.471e+07      
##  3rd Qu.:2.065e+07   3rd Qu.:1.158e+07   3rd Qu.:7.907e+06      
##  Max.   :3.244e+09   Max.   :1.276e+09   Max.   :1.241e+09      
##    New_deaths      population            ratio          
##  Min.   :-2440   Min.   :1.373e+03   Min.   :  0.00001  
##  1st Qu.:    1   1st Qu.:2.962e+06   1st Qu.: 13.69757  
##  Median :    9   Median :1.034e+07   Median : 41.53047  
##  Mean   :  100   Mean   :5.164e+07   Mean   : 41.97129  
##  3rd Qu.:   49   3rd Qu.:3.826e+07   3rd Qu.: 68.14733  
##  Max.   :11447   Max.   :1.447e+09   Max.   :124.73737

#Summary Statistics for Numeric Variables
summary_stats <- data %>%
  summarise(
    Total_Vaccinations = summary(total_vaccinations),
    People_Vaccinated = summary(people_vaccinated),
    People_Fully_Vaccinated = summary(people_fully_vaccinated),
    New_Deaths = summary(New_deaths),
    Population = summary(population),
    Ratio = summary(ratio)
  )
print(summary_stats)

##   Total_Vaccinations People_Vaccinated People_Fully_Vaccinated New_Deaths
## 1                  1                 1                       1 -2440.0000
## 2             728918            457109                  231350     1.0000
## 3            4552479           2724528                 1693674     9.0000
## 4           36894889          20358462                14710135   100.0431
## 5           20652014          11582032                 7907290    49.0000
## 6         3243599000        1275541000              1240777000 11447.0000
##   Population        Ratio
## 1       1373 1.142522e-05
## 2    2962425 1.369757e+01
## 3   10340571 4.153047e+01
## 4   51640598 4.197129e+01
## 5   38261228 6.814733e+01
## 6 1447065329 1.247374e+02

##Plots

We can see that the total vaccinations might need to be normalized, but the amount seems to be around 30,000. and skewed to the left

df <- data

# Load necessary libraries
library(ggplot2)

# Assuming your dataset is already loaded in a variable called df
# Convert 'date' to Date type
df$date <- as.Date(df$date)

# Scatterplot: New Deaths vs Total Vaccinations
deaths_vs_vac <- ggplot(df, aes(x = people_vaccinated, y = New_deaths)) +
  geom_point(color = 'blue') +
  labs(title = "New Deaths vs People Vaccinated", x = "People Vaccinated", y = "New Deaths") +
  theme_minimal()

# Scatterplot: New Deaths vs Ratio
# See if New Deaths impact the death to population ratio at all
deaths_vs_deathsratio <- ggplot(df, aes(x = ratio, y = New_deaths)) +
  geom_point(color = 'green') +
  labs(title = "New Deaths vs Deaths-to-Population Ratio", x = "Deaths-to-Population Ratio", y = "New Deaths") +
  theme_minimal()

# Scatterplot: People Fully Vaccinated vs Ratio
fullyvac_vs_ratio <- ggplot(df, aes(x = people_fully_vaccinated, y = ratio)) +
  geom_point(color = 'purple') +
  labs(title = "People Fully Vaccinated vs Deaths-to-Population Ratio", x = "People Fully Vaccinated", y = "Deaths-to-Population Ratio") +
  theme_minimal()

In the graphs below, we’re exploring the relationship among the several variables that include ratios instead of numbers. The blue graph shows that the people vaccinated is a significant predictor for a decrease in new deaths. New Deaths is also not a huge factor in the death ratio, and the full vaccinated numbers seem to not be related to the deaths-to-population ratio.

deaths_vs_vac

deaths_vs_deathsratio

fullyvac_vs_ratio

#This saves the files as png for the slides. 

ggsave(deaths_vs_vac, 
       filename = "deaths_vs_vac.png",
       device = "png",
       height = 6, width = 5, units = "in")

ggsave(deaths_vs_deathsratio, 
       filename = "deaths_vs_deathsratio.png",
       device = "png",
       height = 6, width = 5, units = "in")

ggsave(fullyvac_vs_ratio, 
       filename = "fullyvac_vs_ratio.png",
       device = "png",
       height = 6, width = 5, units = "in")

The histogram below doesn’t tell us a lot about the frequency of total vaccinations.

#Histogram for Total Vaccinations
ggplot(data, aes(x = total_vaccinations)) +
  geom_histogram(bins = 10, fill = "green", alpha = 0.8) +
  labs(title = "Distribution of Total Vaccinations", x = "Total Vaccinations", y = "Frequency")

This variable is also similar to above

#Histogram for Total Vaccinations
ggplot(data, aes(x = people_fully_vaccinated)) +
  geom_histogram(bins = 10, fill = "pink", alpha = 0.8) +
  labs(title = "Distribution of people fully vaccinationed", x = "fully Vaccinations", y = "Frequency")

The New_deaths column also seems to be similar to the above plot for total vaccinations.

# Histogram for New Deaths
ggplot(data, aes(x = New_deaths)) +
  geom_histogram(bins = 10, fill = "grey", alpha = 0.8) +
  labs(title = "Distribution of New Deaths", x = "New Deaths", y = "Frequency")

We can see here that there seems to be a negative moderate correlation between the total number of deaths and total vaccinations. There is a weird shape in the beginning, and I wonder why that is. But this tells me that there is likely information about when there is an increase in vaccinations, there might be a decrease in deaths overall.

# Scatter Plot of Total Vaccinations vs. New Deaths
ggplot(data, aes(x = total_vaccinations, y = New_deaths)) +
  geom_point(alpha = 0.5) +
  labs(title = "Relationship between Total Vaccinations and New Deaths", x = "Total Vaccinations", y = "New Deaths")

We can also see below that the graphs are very similar, but this is actually forming definitive spikes it seems.

# Scatter Plot of Total Vaccinations vs. New Deaths
ggplot(data, aes(x = people_fully_vaccinated, y = New_deaths)) +
  geom_point(alpha = 0.5) +
  labs(title = "Relationship between fully Vaccinations and New Deaths", x = "fully Vaccinations", y = "New Deaths")

#Methodology

Goals is to look at: different starting points in different countries for the mixed models or random effects model.

Compare the theoretical models, representing the starting point of when the vaccination rate is 0, and how each country’s response value for death rate is if there’s no difference in slope.

Theoretical model for the difference in intercept. Y is the response variable, \(beta_0\) is the intercept, \(b_i\) is if the random effect is added to the intercept.

\[Y=\beta_0+b_i+\beta_1X+\epsilon_{ij}\] \[ \begin{aligned} 1. & \ \beta_0 : \text{Global Intercept} \\ 2. & \ \beta_1 : \text{Fixed effect of vaccinations (global effect)} \\ 3. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 4. & \ \epsilon_{ij} : \text{Error term} \end{aligned} \] This is the more complex theoretical model, where b is the random effects. We later find this model.

\[ Y_{ij} = \beta_0 + b_i + \beta_1 (X) + \beta_2 (X)^2 + \epsilon_{ij} \]

\[ \begin{aligned} 1. & \ \beta_0 : \text{Global Intercept (baseline level of new deaths)} \\ 2. & \ \beta_1 : \text{Linear fixed effect of vaccinations (first polynomial term)} \\ 3. & \ \beta_2 : \text{Quadratic fixed effect of vaccinations (second polynomial term)} \\ 4. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 5. & \ \epsilon_{ij} : \text{Residual error term (captures unexplained variance)} \end{aligned} \]

# Install and load the lme4 package for mixed-effects models
#install.packages("lme4")
library(lme4)
library(ggplot2)
library(dplyr)

Here’s the original model I made. The t values are high, but we want a model to explain the most variance for mixed effects.

# Fit the mixed model with country as a random effect
model <- lmer(New_deaths ~ total_vaccinations + (1 | country), data = df)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ total_vaccinations + (1 | country)
##    Data: df
## 
## REML criterion at convergence: 453265.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -15.667  -0.069  -0.008   0.019  48.764 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 74046    272.1   
##  Residual             54447    233.3   
## Number of obs: 32911, groups:  country, 197
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)         7.757e+01  1.959e+01    3.96
## total_vaccinations -8.982e-07  1.633e-08  -55.02
## 
## Correlation of Fixed Effects:
##             (Intr)
## ttl_vccntns -0.027
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

ranef(model)

## $country
##                                                          (Intercept)
## Afghanistan                                               -51.711760
## Albania                                                   -72.343434
## Algeria                                                   -57.282718
## Andorra                                                   -75.608377
## Angola                                                    -66.241259
## Anguilla                                                  -76.012762
## Antigua and Barbuda                                       -76.618116
## Argentina                                                 135.665235
## Armenia                                                   -62.610439
## Aruba                                                     -76.868818
## Australia                                                 -42.728512
## Austria                                                   -44.277649
## Azerbaijan                                                -54.925948
## Bahamas                                                   -73.748027
## Bahrain                                                   -72.603251
## Bangladesh                                                 40.308876
## Barbados                                                  -76.300270
## Belarus                                                   -59.471281
## Belgium                                                   -41.439215
## Belize                                                    -75.913716
## Benin                                                     -74.012928
## Bermuda                                                   -75.881133
## Bhutan                                                    -75.222994
## Bolivia (Plurinational State of)                          -46.330377
## Bosnia and Herzegovina                                    -53.691151
## Botswana                                                  -73.990036
## Brazil                                                   1152.472085
## Brunei Darussalam                                         -76.601000
## Bulgaria                                                  -26.900645
## Burkina Faso                                              -71.449194
## Burundi                                                   -75.159635
## Cabo Verde                                                -75.497073
## Cambodia                                                  -53.309941
## Cameroon                                                  -73.668573
## Canada                                                      9.192486
## Cayman Islands                                            -76.881132
## Central African Republic                                  -73.854866
## Chad                                                      -75.844214
## Chile                                                      36.136190
## China                                                    2461.684661
## Colombia                                                  198.306085
## Comoros                                                   -75.076830
## Cook Islands                                              -74.657175
## Costa Rica                                                -60.705991
## Croatia                                                   -49.702110
## Cuba                                                      -32.530071
## Curaçao                                                   -76.435866
## Cyprus                                                    -74.507648
## Denmark                                                   -61.757094
## Djibouti                                                  -74.170544
## Dominica                                                  -76.489450
## Dominican Republic                                        -64.775608
## Ecuador                                                   -14.021312
## Egypt                                                     -20.254222
## El Salvador                                               -65.511568
## Equatorial Guinea                                         -76.345116
## Estonia                                                   -71.592664
## Ethiopia                                                  -45.067740
## Falkland Islands (Malvinas)                               -56.712775
## Fiji                                                      -72.989245
## Finland                                                   -61.727571
## France                                                    150.885932
## French Polynesia                                          -74.382753
## Gabon                                                     -75.449418
## Gambia                                                    -75.141313
## Georgia                                                   -39.257180
## Germany                                                   192.780661
## Ghana                                                     -68.941284
## Gibraltar                                                 -77.131923
## Greece                                                    -17.787935
## Greenland                                                 -77.065876
## Grenada                                                   -76.031638
## Guatemala                                                 -42.774994
## Guinea                                                    -74.736361
## Guinea-Bissau                                             -75.627571
## Guyana                                                    -73.860389
## Haiti                                                     -75.020319
## Honduras                                                  -48.621745
## Hungary                                                    66.962563
## Iceland                                                   -76.735358
## India                                                    1527.795127
## Indonesia                                                 358.839097
## Iran (Islamic Republic of)                                189.219996
## Iraq                                                      -47.587140
## Ireland                                                   -62.330662
## Israel                                                    -50.388116
## Italy                                                     167.889416
## Jamaica                                                   -70.114601
## Japan                                                      67.388809
## Jordan                                                    -54.074692
## Kazakhstan                                                -25.631336
## Kenya                                                     -62.320059
## Kiribati                                                  -73.349323
## Kuwait                                                    -68.232868
## Kyrgyzstan                                                -72.548903
## Lao People's Democratic Republic                          -73.411971
## Latvia                                                    -65.254735
## Lebanon                                                   -61.875831
## Lesotho                                                   -71.735019
## Liberia                                                   -72.862327
## Libya                                                     -65.829182
## Liechtenstein                                             -77.328615
## Lithuania                                                 -60.449098
## Luxembourg                                                -74.270345
## Madagascar                                                -70.228494
## Malawi                                                    -71.707655
## Malaysia                                                   36.959469
## Maldives                                                  -76.038157
## Mali                                                      -74.471552
## Malta                                                     -75.886083
## Mauritania                                                -74.023284
## Mauritius                                                 -71.558666
## Mexico                                                    383.173324
## Monaco                                                    -75.189768
## Mongolia                                                  -67.574636
## Montenegro                                                -73.028706
## Montserrat                                                -75.874230
## Morocco                                                   -34.481350
## Mozambique                                                -64.350335
## Myanmar                                                   -35.105402
## Namibia                                                   -56.704276
## Nauru                                                     -70.182836
## Nepal                                                     -37.019282
## Netherlands                                               -38.154198
## New Caledonia                                             -75.655608
## New Zealand                                               -72.739130
## Nicaragua                                                 -70.607872
## Niger                                                     -74.257434
## Nigeria                                                   -63.377850
## Niue                                                      -65.520626
## North Macedonia                                           -64.899874
## Norway                                                    -67.652619
## occupied Palestinian territory, including east Jerusalem  -61.765382
## Oman                                                      -67.562188
## Pakistan                                                   56.780089
## Panama                                                    -66.145330
## Papua New Guinea                                          -72.719741
## Paraguay                                                  -15.036270
## Peru                                                      200.033919
## Philippines                                                44.102525
## Poland                                                    131.602421
## Portugal                                                  -35.681296
## Qatar                                                     -73.044580
## Republic of Korea                                           6.677866
## Republic of Moldova                                       -68.274826
## Romania                                                     3.643146
## Russian Federation                                        701.080959
## Rwanda                                                    -67.112731
## Saint Kitts and Nevis                                     -76.424135
## Saint Lucia                                               -76.233086
## Samoa                                                     -75.428316
## San Marino                                                -76.313250
## Sao Tome and Principe                                     -75.304497
## Saudi Arabia                                              -33.919507
## Senegal                                                   -70.385721
## Serbia                                                    -43.944774
## Seychelles                                                -76.283338
## Sierra Leone                                              -75.093503
## Singapore                                                 -68.394975
## Slovakia                                                  -40.606123
## Slovenia                                                  -66.225166
## Solomon Islands                                           -74.387059
## Somalia                                                   -72.955364
## South Africa                                               47.618801
## South Sudan                                               -75.738957
## Spain                                                      73.318383
## Sri Lanka                                                 -11.202783
## Sudan                                                     -67.953997
## Suriname                                                  -73.053907
## Sweden                                                    -49.054956
## Switzerland                                               -56.341844
## Syrian Arab Republic                                      -68.295153
## Tajikistan                                                -73.100529
## Thailand                                                   40.912852
## The United Kingdom                                        188.790469
## Togo                                                      -73.672310
## Tokelau                                                   -44.698453
## Tonga                                                     -74.732615
## Trinidad and Tobago                                       -65.680129
## Tunisia                                                   -21.279664
## Turkmenistan                                              -58.639803
## Turks and Caicos Islands                                  -74.724526
## Tuvalu                                                    -65.514633
## Uganda                                                    -66.471379
## Ukraine                                                   156.040678
## United Arab Emirates                                      -58.003635
## United Republic of Tanzania                               -69.318950
## United States of America                                 1639.929123
## Uruguay                                                   -55.465882
## Uzbekistan                                                -54.756270
## Vanuatu                                                   -74.722878
## Venezuela (Bolivarian Republic of)                        -46.074740
## Viet Nam                                                   99.188835
## Wallis and Futuna                                         -75.859167
## Yemen                                                     -70.110040
## Zambia                                                    -53.614485
## Zimbabwe                                                  -63.575387
## 
## with conditional variances for "country"

# Plot the fixed effect of vaccinations on deaths
ggplot(df, aes(x = total_vaccinations, y = New_deaths)) +
  geom_point() +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Relationship between Vaccinations and New Deaths", x = "Total Vaccinations", y = "New Deaths") +
  theme_minimal()

We can see above that there is a slight positive correlation between total vax and deaths… however, it’s very weak and we can see some outliers.

Let’s make 8 different models to compare.

##Model Assumptions:

Normality: Random effects are normally distributed. Independence: Clusters such as countries are independent. Linearity: Fixed effects (vaccinations) have a linear relationship with the outcome. Homoscedasticity: Residual variance is constant across predictors. Random Effects Independence: Random effects are uncorrelated with residual errors.

##Why do we use mixed models?

Limitations of Linear Regression: Assumes all observations are independent. Ignores clustering within countries. Fails to account for variability between groups. Advantages of Mixed Models: Models nested data (time points within countries). Captures variability across clusters (country-specific factors that aren’t considered). Provides unbiased estimates and corrects standard errors for group-level differences.

# Model 1: Total Vaccinations as the only predictor
model1 <- lmer(New_deaths ~ total_vaccinations + (1 | country), data = df)

# Model 2: Include People Vaccinated and Fully Vaccinated
model2 <- lmer(New_deaths ~ total_vaccinations + people_vaccinated + people_fully_vaccinated + (1 | country), data = df)

# Model 3: Include Population Size
model3 <- lmer(New_deaths ~ total_vaccinations + population + (1 | country), data = df)

# Model 4: Deaths-to-Population Ratio
model4 <- lmer(ratio ~ total_vaccinations + (1 | country), data = df)

# Model 5: Include Time (Date) as a predictor
model5 <- lmer(New_deaths ~ total_vaccinations + as.numeric(as.Date(date)) + (1 | country), data = df)

# Model 6: Interaction between Vaccinations and Population
model6 <- lmer(New_deaths ~ total_vaccinations * population + (1 | country), data = df)

# Model 7: Deaths per 100,000 People
df$deaths_per_100k <- (df$New_deaths / df$population) * 100000
model7 <- lmer(deaths_per_100k ~ total_vaccinations + (1 | country), data = df)


# Model 9: Non-linear effect of total vaccinations (quadratic term)
model9 <- lmer(New_deaths ~ poly(total_vaccinations, 2) + (1 | country), data = df)

Next, we use the models we made above to compare the BIC and AICs.

# Compare models using AIC and BIC
AIC(model1, model2, model3, model4, model5, model6, model7, model9)

BIC(model1, model2, model3, model4, model5, model6, model7, model9)

# Print the AIC and BIC values for each model
AIC(model1)

## [1] 453273.3

AIC(model2)

## [1] 452696.3

AIC(model3)

## [1] 453038.8

AIC(model4) #this one

## [1] 296527.5

AIC(model5)

## [1] 453273.9

AIC(model6)

## [1] 452531.6

AIC(model7) #this one

## [1] 61457.53

AIC(model9)

## [1] 452274.2

BIC(model1)

## [1] 453306.9

BIC(model2)

## [1] 452746.8

BIC(model3)

## [1] 453080.8

BIC(model4)

## [1] 296561.1

BIC(model5)

## [1] 453315.9

BIC(model6)

## [1] 452582

BIC(model7)

## [1] 61491.14

BIC(model9)

## [1] 452316.2

We can see that model 9 has a moderate BIC/AIC but it’s not overfitted like some other models that have a super low BIC/AIC (not to scale).

The t values we have here are very significant, showing that each coefficient we calculated is statistically significant.

# Get the summary of the model
summary(model9)

## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ poly(total_vaccinations, 2) + (1 | country)
##    Data: df
## 
## REML criterion at convergence: 452264.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -15.417  -0.073  -0.008   0.028  49.506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 53243    230.7   
##  Residual             53010    230.2   
## Number of obs: 32911, groups:  country, 197
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                      24.99      16.67    1.50
## poly(total_vaccinations, 2)1 -22370.41     410.52  -54.49
## poly(total_vaccinations, 2)2  13224.68     426.40   31.02
## 
## Correlation of Fixed Effects:
##             (Intr) p(_,2)1
## ply(tt_,2)1  0.004        
## ply(tt_,2)2 -0.038  0.027

#Results

Here are the residual plots to access model 9. Below, we can see that the cooks distance is mostly even except for outliers, the qqplot shows low normality and the residuals vs. fitted plot lacks consistency. Our model9 is not looking too good for explaining the variance!

# Residuals vs Fitted values plot
plot(fitted(model9), resid(model9), main="Residuals vs Fitted", xlab="Fitted values", ylab="Residuals")
abline(h = 0, col = "red")  # add a horizontal line at 0

# Q-Q plot for normality of residuals
qqnorm(resid(model9))
qqline(resid(model9), col = "red")  # add a Q-Q line

# Cook's distance plot to identify influential points
plot(cooks.distance(model9), main="Cook's Distance")

#to calc the adj r squared 
library(performance)

r2_value <- r2(model9)
print(r2_value)

## # R2 for Mixed Models
## 
##   Conditional R2: 0.582
##      Marginal R2: 0.162

The adjusted R^2 above shows that a small amount of the variance is explained by the fixed and random effects. The only thing about the values is that the condition R^2 is the only significant value, but it only explains 58% of the variance. Here’s a breakdown on each value:

Marginal R² (0.162): Represents the proportion of variance explained by the fixed effects alone (in this case, the quadratic effect of total vaccinations). A value of 0.162 means that the fixed effects explain about 16.2% of the variation in new deaths.

Conditional R² (0.582): Represents the proportion of variance explained by both the fixed effects and the random effects (country-level variation). A value of 0.582 means that the model including country-level variation, explains about 58.2% of the variation in new deaths.

summary(model9)$coefficients

##                                  Estimate Std. Error    t value
## (Intercept)                      24.99098   16.66604   1.499515
## poly(total_vaccinations, 2)1 -22370.41240  410.52007 -54.492859
## poly(total_vaccinations, 2)2  13224.68204  426.39946  31.014772

summary(model9)

## Linear mixed model fit by REML ['lmerMod']
## Formula: New_deaths ~ poly(total_vaccinations, 2) + (1 | country)
##    Data: df
## 
## REML criterion at convergence: 452264.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -15.417  -0.073  -0.008   0.028  49.506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  country  (Intercept) 53243    230.7   
##  Residual             53010    230.2   
## Number of obs: 32911, groups:  country, 197
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                      24.99      16.67    1.50
## poly(total_vaccinations, 2)1 -22370.41     410.52  -54.49
## poly(total_vaccinations, 2)2  13224.68     426.40   31.02
## 
## Correlation of Fixed Effects:
##             (Intr) p(_,2)1
## ply(tt_,2)1  0.004        
## ply(tt_,2)2 -0.038  0.027

The summary above outputs these coefficients:

\[ \text{New Deaths} = 24.99 + b_i - 22370.41 \cdot (\text{Vaccinations}) + 13224.68 \cdot (\text{Vaccinations})^2 + \epsilon_{ij} \]

\[ \begin{aligned} 1. & \ 24.99 : \text{Global intercept (baseline level of new deaths)} \\ 2. & \ -22370.41 : \text{Linear fixed effect of vaccinations (first polynomial term)} \\ 3. & \ 13224.68 : \text{Quadratic fixed effect of vaccinations (second polynomial term)} \\ 4. & \ b_i : \text{Random intercept for each country (accounts for country-specific differences)} \\ 5. & \ \epsilon_{ij} : \text{Residual error term (captures unexplained variance)} \end{aligned} \] #Conclusion.

I found out that the best model to explain COVID-19 mortality among different countries using vaccine data is a mixed affects models with polynomial regression, and the random effect being the intercept. We have many countries to consider in this analysis, so this model helps us generalize a little bit of how to look at vaccine data relating to mortality.

My model is not perfect. Compared to the 7 other models I made, it was good considering the AIC/BIC. However, I learned that the AIC and BIC doesn’t punished unscaled models, which could have put me into a danger zone for choosing an overfitted model (ratio vs. full data numbers). I decided to use the new_deaths and total vaccinations, and disregarded the ratios for clarity. The adjusted R^2 was pretty bad for both fixed and random effects. Fixed effects was the worst, but because we wanted a mixed effects model, the random effects explained 58% of the variability within the data, which is below average! I guess it’s okay considering the amount of countries I was using.

Next time, I would like to control the types of countries I include and just do a few.

So do vaccination rates influence COVID-19 deaths… maybe, maybe not, our model with multiple countries as our random effect shows that we do have to control for countries. We should consider each countries’ policies and economic factors, and look at vaccination rates on a case by case basis. Perhaps it would be better to use different predictors.

Project Housing Analysis

Korin Wheaton

2024-12-08