Covid19 Vaccinations and Deaths

Introduction

      In December 2019, the novel virus, subsequently known as COVID-19, was identified from an outbreak in Wuhan, China in which the World Health Organization (WHO) declared the outbreak as a pandemic in March 2020. It was not until December 2020 that the FDA gave emergency use authorization to two mRNA COVID-19 vaccines, the Pfizer-BioNTech and the Moderna COVID-19 vaccines. Subsequently in 2021, the FDA gave emergency use authorization to the Janssen/Johnson & Johnson COVID-19 vaccine. As of October 7, 2022, the pandemic had caused more than 620 million cases and 6.55 million confirmed deaths, making it one of the deadliest in history. 

We obtained the “Rates of COVID-19 Cases and Deaths by Vaccination Status” collected from March 2021 to July 2022 data set from the Centers for Disease Control and Prevention (CDC) website to study the efficiency of Covid19 vaccinations in preventing COVID deaths within different age groups. The data set is consisted of 1,295 observations and 16 variables, in which the dependent variable is the COVID-19 outcome. The independent variables that we focus for our study are age groups, vaccine products, and fully vaccinated population. Please refer the dataset section below for the description for each variable. For our studying purpose, we perform data wrangling, data exploratory, chi-squared test, and logistic regression model.

https://www.cdc.gov/museum/timeline/covid19.html https://www.yalemedicine.org/news/covid-timeline

Research Question

      Are COVID-19 vaccinations efficient in preventing Covid deaths within age groups?

Hypothesis

      - Null Hypothesis: Covid19 vaccinations are not efficient in preventing Covid deaths within age groups.
      - Alternative Hypothesis: Covid19 vaccinations are efficient in preventing Covid deaths within age groups.

Dataset

      - Outcome: COVID-19 Case or Death [case; death]
      - Month: Calendar month and year corresponding to MMWR week value 
      - MMWR_week MMWR epidemiological year and week [YYYYWW format; e.g. 202101] Plain Text 
      - Age_group: Age Group [5-11 years; 12-17 years; 18-29 years; 30-49 years; 50-64 years; 65-79 years; 
        80+ years; all_ages_adj] 
      - Vaccine_product: FDA-authorized COVID-19 vaccine product name [Janssen; Moderna; Pfizer; all_types] 
      - Vaccinated_with_outcome: Weekly count of individuals vaccinated with at least a primary series with             the corresponding outcome 
      - Fully_vaccinated_population: Cumulative weekly count of the population vaccinated with at least a               primary series  
      - Unvaccinated_with_outcome: Weekly count of unvaccinated individuals with the corresponding outcome  
      - Unvaccinated_population: Cumulative weekly estimated count of the unvaccinated population  
      - Crude_vax_IR: Unadjusted incidence rate of the corresponding outcome among the population 
        vaccinated with at least a primary series (per 100,000 population)  
      - Crude_unvax_IR: Unadjusted incidence rate of the corresponding outcome among the unvaccinated 
        population (per 100,000 population) 
      - Crude_IRR: Unadjusted incidence rate ratio (unvaccinated: vaccinated with at least a primary series)
      - Age_adjusted_vax_IR: Age-standardized incidence rate of the corresponding outcome among the                     population vaccinated with at least a primary series (per 100,000 population) 
      - Age_adjusted_unvax_IR: Age-standardized incidence rate of the corresponding outcome among the                   unvaccinated population (per 100,000 population) 
      - Age_adjusted_IRR: Age-standardized incidence rate ratio (unvaccinated rate: vaccinated with at least             a primary series rate) 
      - Continuity: correction Flag for whether continuity correction was applied for one or more                       jurisdictions in the strata [1=Yes, 0=No]. 

Data Screening: Data Cleanup

We begin by screening that dataset to make sure that our data is ready for further analysis. The first thing was to covert all categorical variables into factor format in order to use them in our analysis. This is done by setting the variable as factor and adding the labels. To continue, by taking a look at the summary of the data - we can see that there are no errors but that there are some missing values. Therefore, we handle them by removing the data with 20 or more missing. Once we have completed this step, we proceeded to handle outliers via the three step method of Leverage, Cooks, and mahalanobis. The common outliers are removed and a new subset is created to perform further analysis.

setwd("/Users/billy/OneDrive/Documents/ANLY 502/")
data <- read.csv("Rates_of_COVID-19_Cases_or_Deaths_by_Age_Group_and_Vaccination_Status.csv")
## Converting character variables 

data$outcome <- factor(data$outcome, 
                       labels = c("case", "death"))
data$month <- factor(data$month, 
                         labels = c("Apr-21", "May-21", "Jun-21", "Jul-21", "Aug-21", "Sep-21", "Oct-21", "Nov-21", "Dec-21", "Jan-22", "Feb-22", "Mar-22", "Apr-22", "May-22", "Jun-22", "Jul-22"))
data$`Age_group` <- factor(data$Age_group, 
                         labels = c("5-11", "12-17", "18-29", "30-49", "50-64", "65-79", "80+", "all_ages_adj"))
data$`Vaccine_product` <- factor(data$Vaccine_product, 
                         labels = c("all_types", "Janssen", "Moderna", "Pfizer"))

apply(data[ , c("outcome", "month", "Age_group", "Vaccine_product")], 2, table)
## $outcome
## 
##  case death 
##   713   680 
## 
## $month
## 
## Apr-21 Apr-22 Aug-21 Dec-21 Feb-22 Jan-22 Jul-21 Jul-22 Jun-21 Jun-22 Mar-22 
##     80     55    100    100     88     88     80     88    108    110     88 
## May-21 May-22 Nov-21 Oct-21 Sep-21 
##     80     88     80     80     80 
## 
## $Age_group
## 
##        12-17        18-29        30-49         5-11        50-64        65-79 
##          133          133          133           63          133          133 
##          80+ all_ages_adj 
##          133          532 
## 
## $Vaccine_product
## 
## all_types   Janssen   Moderna    Pfizer 
##       994       133       133       133
# Categorical variables have been converted to factors for further analysis. 

## Checking errors
#summary(data)
# When running summary, we can see that there are no negative numbers or any errors that need to be handled within our data set so we will proceed to check for missing data.

## Checking missing data
notypos <- data
apply(notypos, 2, function(x) { sum(is.na(x))})
##                     outcome                       month 
##                           0                           0 
##                   MMWR_week                   Age_group 
##                           0                           0 
##             Vaccine_product     Vaccinated_with_outcome 
##                           0                           0 
## Fully_vaccinated_population   Unvaccinated_with_outcome 
##                           0                           0 
##     Unvaccinated_population                Crude_vax_IR 
##                           0                           0 
##              Crude_unvax_IR                   Crude_IRR 
##                           0                          98 
##         Age_adjusted_vax_IR       Age_adjusted_unvax_IR 
##                         861                         861 
##            Age_adjusted_IRR       Continuity_correction 
##                         861                           0
percentmiss <- function(x){ sum(is.na(x))/length(x) * 100}
missing <- apply(notypos, 1, percentmiss)
table(missing)
## missing
##     0 18.75    25 
##   532   763    98
# As we can see above, we have a some missing data and will proceed to remove them.
replace_rows <- subset(notypos, missing <= 20)
noreplace_row <- subset(notypos, missing > 20)

nrow(notypos)
## [1] 1393
nrow(replace_rows)
## [1] 1295
nrow(noreplace_row)
## [1] 98
apply(replace_rows, 2, percentmiss)
##                     outcome                       month 
##                     0.00000                     0.00000 
##                   MMWR_week                   Age_group 
##                     0.00000                     0.00000 
##             Vaccine_product     Vaccinated_with_outcome 
##                     0.00000                     0.00000 
## Fully_vaccinated_population   Unvaccinated_with_outcome 
##                     0.00000                     0.00000 
##     Unvaccinated_population                Crude_vax_IR 
##                     0.00000                     0.00000 
##              Crude_unvax_IR                   Crude_IRR 
##                     0.00000                     0.00000 
##         Age_adjusted_vax_IR       Age_adjusted_unvax_IR 
##                    58.91892                    58.91892 
##            Age_adjusted_IRR       Continuity_correction 
##                    58.91892                     0.00000
replace_columns <- replace_rows[ , -c(1, 2, 4, 5)]
noreplace_columns <- replace_rows[ , c(1, 2, 4, 5)]

#install.packages("mice", repos = "https://cran.us.r-project.org/")
temp_no_miss <- mice(replace_columns, method='cart')
## 
##  iter imp variable
##   1   1  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   1   2  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   1   3  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   1   4  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   1   5  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   2   1  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   2   2  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   2   3  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   2   4  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   2   5  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   3   1  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   3   2  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   3   3  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   3   4  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   3   5  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   4   1  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   4   2  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   4   3  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   4   4  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   4   5  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   5   1  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   5   2  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   5   3  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   5   4  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
##   5   5  Age_adjusted_vax_IR  Age_adjusted_unvax_IR  Age_adjusted_IRR
## Warning: Number of logged events: 75
nomiss <- complete(temp_no_miss, 1)
dim(notypos)
## [1] 1393   16
dim(nomiss)
## [1] 1295   12
all_colunms <- cbind(noreplace_columns, nomiss)
dim(all_colunms)
## [1] 1295   16
summary(all_colunms)
##   outcome        month            Age_group    Vaccine_product   MMWR_week     
##  case :713   Jun-21 :100   all_ages_adj:532   all_types:896    Min.   :202114  
##  death:582   Jun-22 : 98   50-64       :133   Janssen  :133    1st Qu.:202131  
##              Dec-21 : 97   65-79       :133   Moderna  :133    Median :202148  
##              Aug-21 : 90   80+         :133   Pfizer   :133    Mean   :202168  
##              Mar-22 : 86   30-49       :132                    3rd Qu.:202212  
##              Feb-22 : 83   18-29       :110                    Max.   :202229  
##              (Other):741   (Other)     :122                                    
##  Vaccinated_with_outcome Fully_vaccinated_population Unvaccinated_with_outcome
##  Min.   :      1.0       Min.   :    38107           Min.   :      0.0        
##  1st Qu.:    156.5       1st Qu.: 10175936           1st Qu.:    786.5        
##  Median :   1887.0       Median : 25046957           Median :   6783.0        
##  Mean   :  36746.2       Mean   : 36814796           Mean   :  76799.9        
##  3rd Qu.:  23606.5       3rd Qu.: 45899113           3rd Qu.:  58787.5        
##  Max.   :1982037.0       Max.   :150041139           Max.   :1880066.0        
##                                                                               
##  Unvaccinated_population  Crude_vax_IR       Crude_unvax_IR    
##  Min.   :   987580       Min.   :   0.0038   Min.   :   0.000  
##  1st Qu.:  6644913       1st Qu.:   0.6134   1st Qu.:   5.243  
##  Median : 16536440       Median :  11.0721   Median :  73.630  
##  Mean   : 32570854       Mean   :  92.3167   Mean   : 271.348  
##  3rd Qu.: 55867478       3rd Qu.: 100.4499   3rd Qu.: 335.083  
##  Max.   :122905181       Max.   :2057.0984   Max.   :3887.401  
##                                                                
##    Crude_IRR       Age_adjusted_vax_IR Age_adjusted_unvax_IR Age_adjusted_IRR
##  Min.   :  0.000   Min.   :   0.0421   Min.   :   0.684      Min.   : 1.108  
##  1st Qu.:  2.549   1st Qu.:   0.6904   1st Qu.:  14.160      1st Qu.: 3.957  
##  Median :  4.931   Median :  19.3546   Median :  93.633      Median : 6.636  
##  Mean   :  7.741   Mean   :  84.7374   Mean   : 268.010      Mean   : 9.329  
##  3rd Qu.:  8.509   3rd Qu.:  92.2640   3rd Qu.: 379.078      3rd Qu.:13.382  
##  Max.   :120.369   Max.   :1590.8089   Max.   :3330.522      Max.   :34.793  
##                                                                              
##  Continuity_correction
##  Min.   :0.0000       
##  1st Qu.:0.0000       
##  Median :1.0000       
##  Mean   :0.6795       
##  3rd Qu.:1.0000       
##  Max.   :1.0000       
## 
## Outliers 
#leverage
model1 <- lm(Vaccinated_with_outcome ~ MMWR_week + Fully_vaccinated_population + Unvaccinated_with_outcome + Unvaccinated_population + Crude_vax_IR + Crude_unvax_IR + Crude_IRR + Age_adjusted_vax_IR + Age_adjusted_unvax_IR + Age_adjusted_IRR + Continuity_correction, data = all_colunms)
k <- 16 #Number of IV
leverage <- hatvalues(model1)
cutleverage <- (2*k+2) /nrow(all_colunms)
badleverage <- as.numeric(leverage > cutleverage)
table(badleverage)
## badleverage
##    0    1 
## 1218   77
#Cooks
cooks <- cooks.distance(model1)
cutcooks <- 4 / (nrow(all_colunms) - k - 1)
badcooks <- as.numeric(cooks > cutcooks)
table(badcooks)
## badcooks
##    0    1 
## 1235   60
#Mahal
mahal <- mahalanobis(all_colunms[ , -c(1, 2, 3, 4)],
                     colMeans(all_colunms[ , -c(1, 2, 3, 4)]),
                     cov(all_colunms[ , -c(1, 2, 3, 4)]), 
                     tol=1e-20)
cutmahal <- qchisq(1-.001, ncol(all_colunms[ , -c(1, 2, 3, 4)]))
badmahal <- as.numeric(mahal > cutmahal)
table(badmahal)
## badmahal
##    0    1 
## 1216   79
# Overall
totalout <- badmahal + badleverage + badcooks
table(totalout)
## totalout
##    0    1    2    3 
## 1211    5   26   53
noout <- subset(all_colunms, totalout < 2)

### Data is now clean. NA's and outliers have been dealt with. The clean dataset is now named noout.

Data Screening: Assumption Checks

The next phase of our analysis is checking for assumptions. We begin by performing an assumption check for additivity. This is where we take a look to see the correlation between each variable in our dataset. According to our analysis and the plot shown in the below code, we have met the assumption of additivity. Moving on to the assumption of linearity. According ot our analysis, we have not met the assumption of linearity since the dots do not quite line up between -2 and 2. Our next assumption check is normality. Here we are trying to see if our data is normally distributed. As seen in our analysis below and the histogram, we have met the assumption of normality. Last but not least, we take a look at Homogeneity and Homoscedasticity. Both assumptions are met since the scatter plot below is equally distributed from top to bottom and from right to left.

### Checking Assumptions

## Additivity
cor(noout[ , -c(1, 2, 3, 4)])
##                               MMWR_week Vaccinated_with_outcome
## MMWR_week                    1.00000000               0.2075690
## Vaccinated_with_outcome      0.20756904               1.0000000
## Fully_vaccinated_population  0.14033860               0.4526271
## Unvaccinated_with_outcome   -0.10838637               0.6051674
## Unvaccinated_population     -0.23748871               0.1193802
## Crude_vax_IR                 0.24885161               0.6753920
## Crude_unvax_IR               0.02296960               0.5672159
## Crude_IRR                   -0.46982210              -0.2479043
## Age_adjusted_vax_IR          0.15902480               0.5684938
## Age_adjusted_unvax_IR        0.00317378               0.4635185
## Age_adjusted_IRR            -0.47967604              -0.3566683
## Continuity_correction        0.37405311               0.1836956
##                             Fully_vaccinated_population
## MMWR_week                                    0.14033860
## Vaccinated_with_outcome                      0.45262707
## Fully_vaccinated_population                  1.00000000
## Unvaccinated_with_outcome                    0.23490007
## Unvaccinated_population                      0.43737388
## Crude_vax_IR                                 0.03022509
## Crude_unvax_IR                              -0.01235813
## Crude_IRR                                   -0.15828346
## Age_adjusted_vax_IR                          0.09541926
## Age_adjusted_unvax_IR                        0.03160017
## Age_adjusted_IRR                            -0.08098918
## Continuity_correction                        0.39028275
##                             Unvaccinated_with_outcome Unvaccinated_population
## MMWR_week                                  -0.1083864            -0.237488708
## Vaccinated_with_outcome                     0.6051674             0.119380211
## Fully_vaccinated_population                 0.2349001             0.437373883
## Unvaccinated_with_outcome                   1.0000000             0.384408500
## Unvaccinated_population                     0.3844085             1.000000000
## Crude_vax_IR                                0.5061012            -0.099881906
## Crude_unvax_IR                              0.5927061            -0.110923660
## Crude_IRR                                  -0.1730923            -0.165619288
## Age_adjusted_vax_IR                         0.5487665            -0.013231386
## Age_adjusted_unvax_IR                       0.6303294            -0.031781499
## Age_adjusted_IRR                           -0.3028509             0.003218947
## Continuity_correction                       0.1753270             0.409079348
##                             Crude_vax_IR Crude_unvax_IR   Crude_IRR
## MMWR_week                     0.24885161     0.02296960 -0.46982210
## Vaccinated_with_outcome       0.67539197     0.56721594 -0.24790428
## Fully_vaccinated_population   0.03022509    -0.01235813 -0.15828346
## Unvaccinated_with_outcome     0.50610124     0.59270605 -0.17309232
## Unvaccinated_population      -0.09988191    -0.11092366 -0.16561929
## Crude_vax_IR                  1.00000000     0.84964878 -0.32795439
## Crude_unvax_IR                0.84964878     1.00000000 -0.24259127
## Crude_IRR                    -0.32795439    -0.24259127  1.00000000
## Age_adjusted_vax_IR           0.45970780     0.38487619 -0.16484736
## Age_adjusted_unvax_IR         0.42729410     0.46785522 -0.09448754
## Age_adjusted_IRR             -0.45010910    -0.36401336  0.61616154
## Continuity_correction         0.05052707     0.02905750 -0.33500489
##                             Age_adjusted_vax_IR Age_adjusted_unvax_IR
## MMWR_week                           0.159024796            0.00317378
## Vaccinated_with_outcome             0.568493847            0.46351850
## Fully_vaccinated_population         0.095419256            0.03160017
## Unvaccinated_with_outcome           0.548766489            0.63032940
## Unvaccinated_population            -0.013231386           -0.03178150
## Crude_vax_IR                        0.459707805            0.42729410
## Crude_unvax_IR                      0.384876185            0.46785522
## Crude_IRR                          -0.164847364           -0.09448754
## Age_adjusted_vax_IR                 1.000000000            0.90519206
## Age_adjusted_unvax_IR               0.905192061            1.00000000
## Age_adjusted_IRR                   -0.363266948           -0.31261364
## Continuity_correction               0.002687837           -0.04658412
##                             Age_adjusted_IRR Continuity_correction
## MMWR_week                       -0.479676038           0.374053111
## Vaccinated_with_outcome         -0.356668283           0.183695624
## Fully_vaccinated_population     -0.080989177           0.390282749
## Unvaccinated_with_outcome       -0.302850892           0.175326964
## Unvaccinated_population          0.003218947           0.409079348
## Crude_vax_IR                    -0.450109102           0.050527071
## Crude_unvax_IR                  -0.364013355           0.029057499
## Crude_IRR                        0.616161543          -0.335004893
## Age_adjusted_vax_IR             -0.363266948           0.002687837
## Age_adjusted_unvax_IR           -0.312613645          -0.046584121
## Age_adjusted_IRR                 1.000000000          -0.153616389
## Continuity_correction           -0.153616389           1.000000000
corrplot(cor(noout[ , -c(1, 2, 3, 4)]))

## Linearity
random <- rchisq(nrow(noout), 7)
fake <- lm(random ~ .,
           data = noout)
standardized <- rstudent(fake)
fitvaules <- scale(fake$fitted.values)
{qqnorm(standardized)
  abline(0,1)}

plot(fake, 2)

## Normality
skewness(noout[ , -c(1, 2, 3, 4)])
##                   MMWR_week     Vaccinated_with_outcome 
##                   0.2236250                   4.4973390 
## Fully_vaccinated_population   Unvaccinated_with_outcome 
##                   1.6225035                   3.0018507 
##     Unvaccinated_population                Crude_vax_IR 
##                   0.6741116                   3.8043972 
##              Crude_unvax_IR                   Crude_IRR 
##                   2.6620519                   2.2766928 
##         Age_adjusted_vax_IR       Age_adjusted_unvax_IR 
##                   3.1014201                   2.0529742 
##            Age_adjusted_IRR       Continuity_correction 
##                   0.8398245                  -0.8131734
kurtosis(noout[ , -c(1, 2, 3, 4)]) - 3
##                   MMWR_week     Vaccinated_with_outcome 
##                  -1.7330934                  27.6200593 
## Fully_vaccinated_population   Unvaccinated_with_outcome 
##                   2.0091120                  10.5656546 
##     Unvaccinated_population                Crude_vax_IR 
##                  -0.6321728                  23.0086850 
##              Crude_unvax_IR                   Crude_IRR 
##                  11.7945146                   5.8732446 
##         Age_adjusted_vax_IR       Age_adjusted_unvax_IR 
##                  15.7396173                   6.8746818 
##            Age_adjusted_IRR       Continuity_correction 
##                  -0.2709831                  -1.3387491
hist(standardized, breaks=15)

length(standardized)
## [1] 1216
## Homogeneity/Homoscedasticity
{plot(fitvaules, standardized)
  abline(0,0)
  abline(v = 0)}

  1. Assumption of additivity has been met.
  2. Assumption for linearity has not been met as the plot is not lined up between -2 and 2.
  3. Assumption for normality has been met because it is a normal distribution with a slight right skew.
  4. Assumption for Homogeneity and Homoscedasticity have been met.

Exploratory analysis

ggplot(noout, aes(x=outcome)) + 
   geom_bar(fill="steelblue") + 
  theme_minimal()+
 labs(x = "Outcome", 
      y = "Count",
      title = "Outcome of vaccine")

The outcome has 653 cases and 568 death record

ggplot(noout, aes(x=Age_group)) + 
   geom_bar(fill="steelblue") + 
  theme_minimal()+
 labs(x = "Age group", 
      y = "Count",
      title = "Age distribution")

The distribution of Age group is uniform across the dataset, only the ‘all_ages_adj’ have count which is out of proportion.

ggplot(noout, aes(x=Vaccine_product)) + 
   geom_bar(fill="steelblue") + 
  theme_minimal()+
 labs(x = "Vaccine product", 
      y = "Count",
      title = "Types of Vaccine product")+  scale_fill_brewer(palette = "Set2") 

The distribution of vaccine type is uniform across the dataset, ‘all_types’ vaccine which consist of all other vaccines including these 3 are more in numbers which makes sense too.

noout2 <- noout %>% filter(Age_group !='all_ages_adj' ) 
ggplot(noout2, aes(Age_group, fill = outcome)) + 
  geom_bar( position = "dodge")+
  theme_minimal()+
  labs(x = "Age group", 
      y = "Count",
      title = "Plot of outcome among the  age group") 

In distribution of age group by outcome we can see that death is lower for age group less than 50, In age greater than 50 it becomes approximate equal

noout3 <- noout %>% filter(Vaccine_product != 'all_types' ) 
ggplot(noout3, aes(y=Fully_vaccinated_population , x=MMWR_week, color=Vaccine_product)) + 
  geom_point(size = .9)+
  theme_minimal()+
    scale_y_continuous( labels = scales::comma)+
  labs(x = "year and week (YYYYWW)", 
      y = "Fully vaccinated population",
      title = "Plot of vaccinated population ") 

From the plot we can see that pfizer was the most common vaccine received by the population

ggplot(noout2, aes(y=Unvaccinated_population , x=MMWR_week, color=Age_group)) + 
  geom_point()+
  theme_minimal()+
    scale_y_continuous( labels = scales::comma)+
  labs(x = "year and week (YYYYWW)", 
      y = "Fully unvaccinated population",
      title = "Plot of Unvaccinated population ") 

Above plot is of unvaccinated population over the week split by Age group

ggplot(noout2, aes(y=Fully_vaccinated_population , x=MMWR_week, color=Age_group)) + 
  geom_point()+
  theme_minimal()+
    scale_y_continuous( labels = scales::comma)+
  labs(x = "year and week (YYYYWW)", 
      y = "Fully vaccinated population",
      title = "Plot of Fully vaccinated population ") 

Above is the plot of vaccinated population over the week split by Age group


Chi- squared test

We wanted to Chi- squared test for Age group and outcome variable

H0 : Age and outcome are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable

H1: Age and outcome are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable

table(noout$Age_group,noout$outcome)
##               
##                case death
##   5-11           29     6
##   12-17          60    13
##   18-29          61    36
##   30-49          62    60
##   50-64          64    63
##   65-79          59    62
##   80+            64    64
##   all_ages_adj  254   259
chisq.test(noout$outcome, noout$Age_group)
## 
##  Pearson's Chi-squared test
## 
## data:  noout$outcome and noout$Age_group
## X-squared = 45.57, df = 7, p-value = 1.06e-07
# Doing the test by removing 'all_ages_adj'
noout2 <- noout %>% filter(Age_group !='all_ages_adj' ) 
table(noout2$Age_group,noout2$outcome)
##               
##                case death
##   5-11           29     6
##   12-17          60    13
##   18-29          61    36
##   30-49          62    60
##   50-64          64    63
##   65-79          59    62
##   80+            64    64
##   all_ages_adj    0     0
chisq.test(noout2$Age_group,noout2$outcome )
## 
##  Pearson's Chi-squared test
## 
## data:  noout2$Age_group and noout2$outcome
## X-squared = 39.822, df = 6, p-value = 4.937e-07

We can reject our Null hypothesis as p < 0.01 which means Age and outcome are dependent, there is a relationship between the two categorical variables.


Logistic Regression

glm(formula = outcome ~ Age_group + Vaccine_product + Fully_vaccinated_population +
    Unvaccinated_population, family = "binomial", data = all_colunms)
## 
## Call:  glm(formula = outcome ~ Age_group + Vaccine_product + Fully_vaccinated_population + 
##     Unvaccinated_population, family = "binomial", data = all_colunms)
## 
## Coefficients:
##                 (Intercept)               Age_group12-17  
##                  -2.711e-01                    7.418e-06  
##              Age_group18-29               Age_group30-49  
##                   2.416e+00                    4.283e+00  
##              Age_group50-64               Age_group65-79  
##                   3.229e+00                    2.247e+00  
##                Age_group80+        Age_groupall_ages_adj  
##                   8.513e-01                    1.447e+01  
##      Vaccine_productJanssen       Vaccine_productModerna  
##                  -8.677e+00                   -5.921e+00  
##       Vaccine_productPfizer  Fully_vaccinated_population  
##                  -4.116e+00                   -7.603e-08  
##     Unvaccinated_population  
##                  -7.624e-08  
## 
## Degrees of Freedom: 1294 Total (i.e. Null);  1282 Residual
## Null Deviance:       1782 
## Residual Deviance: 1657  AIC: 1683

From using the Generalized Linear Model function to perform our linear regression, we obtained the results illustrated above. We can observe that at a confidential level of 95%, three variables can be successfully considered as significant predictors. These variables are: - Vaccine products as they all have their “Pr(>|z|) values smaller than our confidence level of 95%. - The other variable that would be successfully considered a significant predictor is the”fully_vaccinated_population” as we can also notice that its “Pr(>|z|) value is less than our confidence level of 95%. - The third variable would be the “Age group 12-17” as this variable also has a “Pr(>|z|)” value that is smaller than our confidence level of 95%. We can therefore conclude that Being fully vaccinated (with the three approved vaccine products: Pfizer, Moderna and Janssen) are efficient at preventing covid death within certain age groups (in our case, that age group would be 12-17) —

Conclusion

We have studied COVID data interpretation and visualization using R in the US to understand better how COVID impact age, deaths and population. The pandemic takes variable shapes and forms across the US and amongst different age range. The pandemic has impacted the US with discrepancy in the vaccinated and unvaccinated population, fragmentation of their ages and higher socio-economic inequities more than others. Age 50 – 64 consist of the one of the highest cases and highest range with covid deaths.