Final Project

Introduction

Research Question: What factors predict the percentage of people fully vaccinated against COVID-19 across different countries?

This project uses the COVID-19 vaccination dataset published by Our World in Data (OWID). The dataset provides country-level information on vaccination coverage along with demographic, economic, and policy-related indicators. Each observation represents a single country, and the primary outcome variable is people_fully_vaccinated_per_hundred, which measures the percentage of a country’s population that has completed the full COVID-19 vaccination series.

The dataset contains numerous variables related to vaccination progress, economic development, population structure, and public health policy. I chose this topic because COVID-19 vaccination rates varied substantially across countries, and examining the factors associated with higher vaccination coverage can help explain global inequalities in health outcomes and resource access.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(car)

## Warning: package 'car' was built under R version 4.5.2

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.5.2

## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

# Load local CSV file
#setwd("~/Data 101/Final project")
vaccinations <- read.csv("owid-covid-latest.csv")



# Inspect the data

head(vaccinations)

##   iso_code continent       location last_updated_date total_cases new_cases
## 1      AFG      Asia    Afghanistan          8/4/2024      235214         0
## 2 OWID_AFR                   Africa          8/4/2024    13145380        36
## 3      ALB    Europe        Albania          8/4/2024      335047         0
## 4      DZA    Africa        Algeria          8/4/2024      272139        18
## 5      ASM   Oceania American Samoa          8/4/2024        8359         0
## 6      AND    Europe        Andorra          8/4/2024       48015         0
##   new_cases_smoothed total_deaths new_deaths new_deaths_smoothed
## 1              0.000         7998          0                   0
## 2              5.143       259117          0                   0
## 3              0.000         3605          0                   0
## 4              2.571         6881          0                   0
## 5              0.000           34          0                   0
## 6              0.000          159          0                   0
##   total_cases_per_million new_cases_per_million new_cases_smoothed_per_million
## 1                5796.468                 0.000                          0.000
## 2                9088.877                 0.025                          0.004
## 3              118491.020                 0.000                          0.000
## 4                5984.050                 0.396                          0.057
## 5              172831.600                 0.000                          0.000
## 6              602280.440                 0.000                          0.000
##   total_deaths_per_million new_deaths_per_million
## 1                  197.098                      0
## 2                  179.157                      0
## 3                 1274.926                      0
## 4                  151.306                      0
## 5                  702.988                      0
## 6                 1994.431                      0
##   new_deaths_smoothed_per_million reproduction_rate icu_patients
## 1                               0                NA           NA
## 2                               0                NA           NA
## 3                               0                NA           NA
## 4                               0                NA           NA
## 5                               0                NA           NA
## 6                               0                NA           NA
##   icu_patients_per_million hosp_patients hosp_patients_per_million
## 1                       NA            NA                        NA
## 2                       NA            NA                        NA
## 3                       NA            NA                        NA
## 4                       NA            NA                        NA
## 5                       NA            NA                        NA
## 6                       NA            NA                        NA
##   weekly_icu_admissions weekly_icu_admissions_per_million
## 1                    NA                                NA
## 2                    NA                                NA
## 3                    NA                                NA
## 4                    NA                                NA
## 5                    NA                                NA
## 6                    NA                                NA
##   weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests
## 1                     NA                                 NA          NA
## 2                     NA                                 NA          NA
## 3                     NA                                 NA          NA
## 4                     NA                                 NA          NA
## 5                     NA                                 NA          NA
## 6                     NA                                 NA          NA
##   new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed
## 1        NA                       NA                     NA                 NA
## 2        NA                       NA                     NA                 NA
## 3        NA                       NA                     NA                 NA
## 4        NA                       NA                     NA                 NA
## 5        NA                       NA                     NA                 NA
## 6        NA                       NA                     NA                 NA
##   new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units
## 1                              NA            NA             NA          NA
## 2                              NA            NA             NA          NA
## 3                              NA            NA             NA          NA
## 4                              NA            NA             NA          NA
## 5                              NA            NA             NA          NA
## 6                              NA            NA             NA          NA
##   total_vaccinations people_vaccinated people_fully_vaccinated total_boosters
## 1                 NA                NA                      NA             NA
## 2                 NA                NA                      NA             NA
## 3                 NA                NA                      NA             NA
## 4                 NA                NA                      NA             NA
## 5                 NA                NA                      NA             NA
## 6                 NA                NA                      NA             NA
##   new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred
## 1               NA                        NA                             NA
## 2               NA                        NA                             NA
## 3               NA                        NA                             NA
## 4               NA                        NA                             NA
## 5               NA                        NA                             NA
## 6               NA                        NA                             NA
##   people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
## 1                            NA                                  NA
## 2                            NA                                  NA
## 3                            NA                                  NA
## 4                            NA                                  NA
## 5                            NA                                  NA
## 6                            NA                                  NA
##   total_boosters_per_hundred new_vaccinations_smoothed_per_million
## 1                         NA                                    NA
## 2                         NA                                    NA
## 3                         NA                                    NA
## 4                         NA                                    NA
## 5                         NA                                    NA
## 6                         NA                                    NA
##   new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred
## 1                             NA                                         NA
## 2                             NA                                         NA
## 3                             NA                                         NA
## 4                             NA                                         NA
## 5                             NA                                         NA
## 6                             NA                                         NA
##   stringency_index population_density median_age aged_65_older aged_70_older
## 1               NA             54.422       18.6         2.581         1.337
## 2               NA                 NA         NA            NA            NA
## 3               NA            104.871       38.0        13.188         8.643
## 4               NA             17.348       29.1         6.211         3.857
## 5               NA            278.205         NA            NA            NA
## 6               NA            163.755         NA            NA            NA
##   gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence
## 1       1803.987              NA               597.029                9.59
## 2             NA              NA                    NA                  NA
## 3      11803.431             1.1               304.195               10.08
## 4      13913.839             0.5               278.364                6.73
## 5             NA              NA               283.750                  NA
## 6             NA              NA               109.135                7.97
##   female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand
## 1             NA           NA                 37.746                       0.50
## 2             NA           NA                     NA                         NA
## 3            7.1         51.2                     NA                       2.89
## 4            0.7         30.4                 83.741                       1.90
## 5             NA           NA                     NA                         NA
## 6           29.0         37.8                     NA                         NA
##   life_expectancy human_development_index population
## 1           64.83                   0.511   41128772
## 2              NA                      NA 1426736614
## 3           78.57                   0.795    2842318
## 4           76.88                   0.748   44903228
## 5           73.74                      NA      44295
## 6           83.73                   0.868      79843
##   excess_mortality_cumulative_absolute excess_mortality_cumulative
## 1                                   NA                          NA
## 2                                   NA                          NA
## 3                                   NA                          NA
## 4                                   NA                          NA
## 5                                   NA                          NA
## 6                                   NA                          NA
##   excess_mortality excess_mortality_cumulative_per_million
## 1               NA                                      NA
## 2               NA                                      NA
## 3               NA                                      NA
## 4               NA                                      NA
## 5               NA                                      NA
## 6               NA                                      NA

str(vaccinations)

## 'data.frame':    247 obs. of  67 variables:
##  $ iso_code                                  : chr  "AFG" "OWID_AFR" "ALB" "DZA" ...
##  $ continent                                 : chr  "Asia" "" "Europe" "Africa" ...
##  $ location                                  : chr  "Afghanistan" "Africa" "Albania" "Algeria" ...
##  $ last_updated_date                         : chr  "8/4/2024" "8/4/2024" "8/4/2024" "8/4/2024" ...
##  $ total_cases                               : int  235214 13145380 335047 272139 8359 48015 107481 3904 9106 10101218 ...
##  $ new_cases                                 : int  0 36 0 18 0 0 0 0 0 54 ...
##  $ new_cases_smoothed                        : num  0 5.14 0 2.57 0 ...
##  $ total_deaths                              : int  7998 259117 3605 6881 34 159 1937 12 146 130663 ...
##  $ new_deaths                                : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ new_deaths_smoothed                       : num  0 0 0 0 0 0 0 0 0 0.143 ...
##  $ total_cases_per_million                   : num  5796 9089 118491 5984 172832 ...
##  $ new_cases_per_million                     : num  0 0.025 0 0.396 0 ...
##  $ new_cases_smoothed_per_million            : num  0 0.004 0 0.057 0 0 0 0 0 0.17 ...
##  $ total_deaths_per_million                  : num  197 179 1275 151 703 ...
##  $ new_deaths_per_million                    : num  0 0 0 0 0 0 0 0 0 0.022 ...
##  $ new_deaths_smoothed_per_million           : num  0 0 0 0 0 0 0 0 0 0.003 ...
##  $ reproduction_rate                         : logi  NA NA NA NA NA NA ...
##  $ icu_patients                              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ icu_patients_per_million                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hosp_patients                             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ hosp_patients_per_million                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_icu_admissions                     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_icu_admissions_per_million         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_hosp_admissions                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weekly_hosp_admissions_per_million        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_tests                               : logi  NA NA NA NA NA NA ...
##  $ new_tests                                 : logi  NA NA NA NA NA NA ...
##  $ total_tests_per_thousand                  : logi  NA NA NA NA NA NA ...
##  $ new_tests_per_thousand                    : logi  NA NA NA NA NA NA ...
##  $ new_tests_smoothed                        : logi  NA NA NA NA NA NA ...
##  $ new_tests_smoothed_per_thousand           : logi  NA NA NA NA NA NA ...
##  $ positive_rate                             : logi  NA NA NA NA NA NA ...
##  $ tests_per_case                            : logi  NA NA NA NA NA NA ...
##  $ tests_units                               : logi  NA NA NA NA NA NA ...
##  $ total_vaccinations                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_vaccinated                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_fully_vaccinated                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_boosters                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations                          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations_smoothed                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_vaccinations_per_hundred            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_vaccinated_per_hundred             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ people_fully_vaccinated_per_hundred       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ total_boosters_per_hundred                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_vaccinations_smoothed_per_million     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_people_vaccinated_smoothed            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ new_people_vaccinated_smoothed_per_hundred: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ stringency_index                          : logi  NA NA NA NA NA NA ...
##  $ population_density                        : num  54.4 NA 104.9 17.3 278.2 ...
##  $ median_age                                : num  18.6 NA 38 29.1 NA NA 16.8 NA 32.1 31.9 ...
##  $ aged_65_older                             : num  2.58 NA 13.19 6.21 NA ...
##  $ aged_70_older                             : num  1.34 NA 8.64 3.86 NA ...
##  $ gdp_per_capita                            : num  1804 NA 11803 13914 NA ...
##  $ extreme_poverty                           : num  NA NA 1.1 0.5 NA NA NA NA NA 0.6 ...
##  $ cardiovasc_death_rate                     : num  597 NA 304 278 284 ...
##  $ diabetes_prevalence                       : num  9.59 NA 10.08 6.73 NA ...
##  $ female_smokers                            : num  NA NA 7.1 0.7 NA 29 NA NA NA 16.2 ...
##  $ male_smokers                              : num  NA NA 51.2 30.4 NA 37.8 NA NA NA 27.7 ...
##  $ handwashing_facilities                    : num  37.7 NA NA 83.7 NA ...
##  $ hospital_beds_per_thousand                : num  0.5 NA 2.89 1.9 NA NA NA NA 3.8 5 ...
##  $ life_expectancy                           : num  64.8 NA 78.6 76.9 73.7 ...
##  $ human_development_index                   : num  0.511 NA 0.795 0.748 NA 0.868 0.581 NA 0.778 0.845 ...
##  $ population                                : num  4.11e+07 1.43e+09 2.84e+06 4.49e+07 4.43e+04 ...
##  $ excess_mortality_cumulative_absolute      : logi  NA NA NA NA NA NA ...
##  $ excess_mortality_cumulative               : logi  NA NA NA NA NA NA ...
##  $ excess_mortality                          : logi  NA NA NA NA NA NA ...
##  $ excess_mortality_cumulative_per_million   : logi  NA NA NA NA NA NA ...

Data Analysis

To prepare the data for analysis, I first restricted the dataset to country-level observations by removing OWID aggregate regions (such as income groups and world totals). Next, I selected variables relevant to the research question, including vaccination coverage, economic development, demographics, and policy stringency. Missing values were handled by removing incomplete cases for the final regression model. Finally, GDP per capita was log-transformed to reduce skewness and better satisfy linear regression assumptions. These steps ensure the dataset is clean, comparable across countries, and suitable for multiple linear regression.

covid_countries <- vaccinations |>
filter(!startsWith(iso_code, "OWID"))


head(covid_countries)

##   iso_code continent       location last_updated_date total_cases new_cases
## 1      AFG      Asia    Afghanistan          8/4/2024      235214         0
## 2      ALB    Europe        Albania          8/4/2024      335047         0
## 3      DZA    Africa        Algeria          8/4/2024      272139        18
## 4      ASM   Oceania American Samoa          8/4/2024        8359         0
## 5      AND    Europe        Andorra          8/4/2024       48015         0
## 6      AGO    Africa         Angola          8/4/2024      107481         0
##   new_cases_smoothed total_deaths new_deaths new_deaths_smoothed
## 1              0.000         7998          0                   0
## 2              0.000         3605          0                   0
## 3              2.571         6881          0                   0
## 4              0.000           34          0                   0
## 5              0.000          159          0                   0
## 6              0.000         1937          0                   0
##   total_cases_per_million new_cases_per_million new_cases_smoothed_per_million
## 1                5796.468                 0.000                          0.000
## 2              118491.020                 0.000                          0.000
## 3                5984.050                 0.396                          0.057
## 4              172831.600                 0.000                          0.000
## 5              602280.440                 0.000                          0.000
## 6                3016.162                 0.000                          0.000
##   total_deaths_per_million new_deaths_per_million
## 1                  197.098                      0
## 2                 1274.926                      0
## 3                  151.306                      0
## 4                  702.988                      0
## 5                 1994.431                      0
## 6                   54.357                      0
##   new_deaths_smoothed_per_million reproduction_rate icu_patients
## 1                               0                NA           NA
## 2                               0                NA           NA
## 3                               0                NA           NA
## 4                               0                NA           NA
## 5                               0                NA           NA
## 6                               0                NA           NA
##   icu_patients_per_million hosp_patients hosp_patients_per_million
## 1                       NA            NA                        NA
## 2                       NA            NA                        NA
## 3                       NA            NA                        NA
## 4                       NA            NA                        NA
## 5                       NA            NA                        NA
## 6                       NA            NA                        NA
##   weekly_icu_admissions weekly_icu_admissions_per_million
## 1                    NA                                NA
## 2                    NA                                NA
## 3                    NA                                NA
## 4                    NA                                NA
## 5                    NA                                NA
## 6                    NA                                NA
##   weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests
## 1                     NA                                 NA          NA
## 2                     NA                                 NA          NA
## 3                     NA                                 NA          NA
## 4                     NA                                 NA          NA
## 5                     NA                                 NA          NA
## 6                     NA                                 NA          NA
##   new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed
## 1        NA                       NA                     NA                 NA
## 2        NA                       NA                     NA                 NA
## 3        NA                       NA                     NA                 NA
## 4        NA                       NA                     NA                 NA
## 5        NA                       NA                     NA                 NA
## 6        NA                       NA                     NA                 NA
##   new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units
## 1                              NA            NA             NA          NA
## 2                              NA            NA             NA          NA
## 3                              NA            NA             NA          NA
## 4                              NA            NA             NA          NA
## 5                              NA            NA             NA          NA
## 6                              NA            NA             NA          NA
##   total_vaccinations people_vaccinated people_fully_vaccinated total_boosters
## 1                 NA                NA                      NA             NA
## 2                 NA                NA                      NA             NA
## 3                 NA                NA                      NA             NA
## 4                 NA                NA                      NA             NA
## 5                 NA                NA                      NA             NA
## 6                 NA                NA                      NA             NA
##   new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred
## 1               NA                        NA                             NA
## 2               NA                        NA                             NA
## 3               NA                        NA                             NA
## 4               NA                        NA                             NA
## 5               NA                        NA                             NA
## 6               NA                        NA                             NA
##   people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
## 1                            NA                                  NA
## 2                            NA                                  NA
## 3                            NA                                  NA
## 4                            NA                                  NA
## 5                            NA                                  NA
## 6                            NA                                  NA
##   total_boosters_per_hundred new_vaccinations_smoothed_per_million
## 1                         NA                                    NA
## 2                         NA                                    NA
## 3                         NA                                    NA
## 4                         NA                                    NA
## 5                         NA                                    NA
## 6                         NA                                    NA
##   new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred
## 1                             NA                                         NA
## 2                             NA                                         NA
## 3                             NA                                         NA
## 4                             NA                                         NA
## 5                             NA                                         NA
## 6                             NA                                         NA
##   stringency_index population_density median_age aged_65_older aged_70_older
## 1               NA             54.422       18.6         2.581         1.337
## 2               NA            104.871       38.0        13.188         8.643
## 3               NA             17.348       29.1         6.211         3.857
## 4               NA            278.205         NA            NA            NA
## 5               NA            163.755         NA            NA            NA
## 6               NA             23.890       16.8         2.405         1.362
##   gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence
## 1       1803.987              NA               597.029                9.59
## 2      11803.431             1.1               304.195               10.08
## 3      13913.839             0.5               278.364                6.73
## 4             NA              NA               283.750                  NA
## 5             NA              NA               109.135                7.97
## 6       5819.495              NA               276.045                3.94
##   female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand
## 1             NA           NA                 37.746                       0.50
## 2            7.1         51.2                     NA                       2.89
## 3            0.7         30.4                 83.741                       1.90
## 4             NA           NA                     NA                         NA
## 5           29.0         37.8                     NA                         NA
## 6             NA           NA                 26.664                         NA
##   life_expectancy human_development_index population
## 1           64.83                   0.511   41128772
## 2           78.57                   0.795    2842318
## 3           76.88                   0.748   44903228
## 4           73.74                      NA      44295
## 5           83.73                   0.868      79843
## 6           61.15                   0.581   35588996
##   excess_mortality_cumulative_absolute excess_mortality_cumulative
## 1                                   NA                          NA
## 2                                   NA                          NA
## 3                                   NA                          NA
## 4                                   NA                          NA
## 5                                   NA                          NA
## 6                                   NA                          NA
##   excess_mortality excess_mortality_cumulative_per_million
## 1               NA                                      NA
## 2               NA                                      NA
## 3               NA                                      NA
## 4               NA                                      NA
## 5               NA                                      NA
## 6               NA                                      NA

analysis_data <- covid_countries |>
select(
location,
iso_code,
people_fully_vaccinated_per_hundred,
gdp_per_capita,
human_development_index,
median_age,
population_density,
stringency_index
)

head(analysis_data)

##         location iso_code people_fully_vaccinated_per_hundred gdp_per_capita
## 1    Afghanistan      AFG                                  NA       1803.987
## 2        Albania      ALB                                  NA      11803.431
## 3        Algeria      DZA                                  NA      13913.839
## 4 American Samoa      ASM                                  NA             NA
## 5        Andorra      AND                                  NA             NA
## 6         Angola      AGO                                  NA       5819.495
##   human_development_index median_age population_density stringency_index
## 1                   0.511       18.6             54.422               NA
## 2                   0.795       38.0            104.871               NA
## 3                   0.748       29.1             17.348               NA
## 4                      NA         NA            278.205               NA
## 5                   0.868         NA            163.755               NA
## 6                   0.581       16.8             23.890               NA

final_data <- analysis_data |>
mutate(
log_gdp_per_capita = log(gdp_per_capita)
) |>
drop_na(
people_fully_vaccinated_per_hundred,
log_gdp_per_capita,
human_development_index,
median_age,
population_density
)

dim(final_data)

## [1] 6 9

summary(final_data)

##    location           iso_code         people_fully_vaccinated_per_hundred
##  Length:6           Length:6           Min.   :65.06                      
##  Class :character   Class :character   1st Qu.:66.08                      
##  Mode  :character   Mode  :character   Median :67.79                      
##                                        Mean   :73.06                      
##                                        3rd Qu.:77.98                      
##                                        Max.   :90.85                      
##  gdp_per_capita  human_development_index   median_age    population_density
##  Min.   : 6427   Min.   :0.6450          Min.   :28.20   Min.   :  31.03   
##  1st Qu.:27476   1st Qu.:0.8280          1st Qu.:33.10   1st Qu.:  57.91   
##  Median :29503   Median :0.8870          Median :43.00   Median : 116.72   
##  Mean   :30150   Mean   :0.8463          Mean   :38.73   Mean   :1299.96   
##  3rd Qu.:31835   3rd Qu.:0.8980          3rd Qu.:43.45   3rd Qu.: 372.11   
##  Max.   :56055   Max.   :0.9490          Max.   :44.80   Max.   :7039.71   
##  stringency_index log_gdp_per_capita
##  Mode:logical     Min.   : 8.768    
##  NA's:6           1st Qu.:10.220    
##                   Median :10.292    
##                   Mean   :10.146    
##                   3rd Qu.:10.367    
##                   Max.   :10.934

Statistical Analysis Method: Multiple Linear Regression

Because the response variable people_fully_vaccinated_per_hundred is continuous, multiple linear regression is an appropriate method for examining how several country-level predictors jointly explain variation in full COVID-19 vaccination coverage. This model allows us to estimate the association between vaccination rates and factors such as economic development, human development, population characteristics, and government policy stringency while holding other variables constant.

vaccination_model <- lm(
people_fully_vaccinated_per_hundred ~
log_gdp_per_capita +
human_development_index +
median_age +
population_density,
data = final_data
)

summary(vaccination_model)

## 
## Call:
## lm(formula = people_fully_vaccinated_per_hundred ~ log_gdp_per_capita + 
##     human_development_index + median_age + population_density, 
##     data = final_data)
## 
## Residuals:
##        1        2        3        4        5        6 
## -1.21003  0.71462  0.02123 -0.07444  0.48709  0.06153 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)             -1.464e+02  6.839e+01  -2.140    0.278
## log_gdp_per_capita       4.631e+01  1.627e+01   2.846    0.215
## human_development_index -3.165e+02  1.481e+02  -2.138    0.279
## median_age               3.835e-01  7.552e-01   0.508    0.701
## population_density       1.992e-03  4.507e-04   4.420    0.142
## 
## Residual standard error: 1.491 on 1 degrees of freedom
## Multiple R-squared:  0.996,  Adjusted R-squared:   0.98 
## F-statistic: 62.39 on 4 and 1 DF,  p-value: 0.09464

Interpretation

The multiple linear regression model examined how economic development, human development, demographic structure, and population density relate to the percentage of people fully vaccinated against COVID-19 across countries. The model explains a very large proportion of the variation in vaccination coverage, with an Rsquare of 0.996 and an adjusted Rsquare of 0.98, indicating that the included predictors jointly account for nearly all observed differences in full vaccination rates in the analytic sample.

Holding other variables constant, log GDP per capita has a positive estimated coefficient, suggesting that countries with higher economic capacity tend to have higher percentages of fully vaccinated individuals. This aligns with expectations, as wealthier countries generally have greater access to vaccines, stronger healthcare infrastructure, and more efficient distribution systems. Human Development Index (HDI) shows a negative estimated coefficient in this model, which may reflect overlap with GDP per capita and other predictors rather than a true negative relationship, suggesting potential multicollinearity among development-related variables.

Median age has a positive coefficient, indicating that countries with older populations tend to have higher full vaccination rates, which is consistent with prioritization of older adults in vaccination campaigns. Population density also shows a positive association with full vaccination coverage, suggesting that more densely populated countries may have achieved higher vaccination rates, possibly due to greater perceived risk of transmission or more centralized healthcare delivery.

Although the overall model F-test is statistically significant at the 10% level (p ≈ 0.095), individual predictors are not statistically significant at conventional 5% levels in this specification. This suggests that while the predictors jointly explain vaccination coverage very well, their individual effects are difficult to separate precisely in this sample, likely due to strong correlations among development and demographic variables.

Regression Assumptions and Diagnostics Linearity

crPlots(vaccination_model)

Independence of Observations

Because each row is a different country, independence of observations is a reasonable assumption for this cross-sectional dataset.

plot(resid(vaccination_model), type = "b",
main = "Residuals vs Order",
ylab = "Residuals")
abline(h = 0, lty = 2)

Homoscedasticity, Normality, and Influential Points

par(mfrow = c(2, 2))
plot(vaccination_model)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

par(mfrow = c(1, 1))

Interpretation

Residuals vs Order

The residuals-versus-order plot shows no clear systematic trend over the observation order, with residuals fluctuating around zero. This suggests that the assumption of independence of observations is reasonably satisfied. Because each observation represents a distinct country, independence is also supported by the cross-sectional structure of the data.

Residuals vs Fitted

The Residuals vs Fitted plot shows residuals scattered around zero, though with visible structure due to the very small analytic sample size. While some curvature appears, this pattern is likely driven by overfitting rather than a strong violation of linearity. Overall, the linear form is acceptable given the exploratory nature of the analysis.

Normal Q–Q Plot

The Normal Q–Q plot shows noticeable deviations from the reference line, particularly in the tails. This suggests departures from normality in the residuals. However, with an extremely small sample size, such deviations are expected and do not meaningfully undermine the analysis. Normality assumptions are therefore interpreted cautiously.

Scale–Location Plot

The Scale–Location plot indicates uneven spread of residuals across fitted values, suggesting potential heteroscedasticity. Given the limited number of observations, this pattern is difficult to assess reliably and is likely influenced by sample size constraints rather than systematic variance instability.

Conclusion and Future Directions

This project examined country-level predictors of COVID-19 full vaccination rates using multiple linear regression. The analysis explored how economic development, human development, demographic structure, and population density relate to the percentage of people fully vaccinated across countries. The regression model demonstrates that development-related factors are strongly associated with vaccination coverage, highlighting the role of structural and socioeconomic capacity in shaping public health outcomes.

At the same time, the analysis revealed substantial multicollinearity among development indicators and a very small effective sample size due to missing data across countries. As a result, individual coefficient estimates and statistical significance should be interpreted cautiously. The exceptionally high R-squared likely reflects overfitting rather than true explanatory power, underscoring the limitations of cross-sectional country-level data when multiple correlated predictors are included.

Future research could strengthen this analysis by using longitudinal (panel) data to increase the number of observations, allowing for more stable estimates and better assessment of temporal dynamics. Additional improvements could include reducing predictor redundancy, incorporating regional fixed effects, or adding healthcare system capacity measures. Despite its limitations, this analysis provides References

Our World in Data. (n.d.). Coronavirus (COVID-19) vaccinations. https://ourworldindata.org/covid-vaccinations

R Core Team. (n.d.). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

Fox, J., & Weisberg, S. (n.d.). car package documentation.

Final Project

Zebidian

2025-12-12