607 Assignment_wk5

by Catherine Cho

library(readr)
urlfile<-"https://raw.githubusercontent.com/catcho1632/607-Assignment_wk5/main/israeli_vaccination_data_analysis_start.csv"
vax_raw<-read_csv(url(urlfile))
## New names:
## * `` -> ...3
## * `` -> ...5
## Rows: 19 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Age, Population %, ...3, Severe Cases, ...5, Efficacy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Calculating the total population.

  • Being fully vaccinated, means that an individual has received all 3 doses. The first and second and a booster. Not vaccinated means the individual has not recieved any doses. The data does not consider the “vaccinated” meaning those that are partially vaccinated. (2 or less doses). This is why the sum of all the age groups does not amount to 100%. So the total population calculated in this section accounts for this discrepancy and calculates the actual total population, which amounts to 7.6 Million.
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
vax_trunc<-vax_raw[2:5,1:3]
#Renaming column names
colnames(vax_trunc)<-c('age_group','Not_Vax','Fully_Vax')

#Adding a fourth column conditional to the type  of value in previous column
vax_trunc$type<-ifelse(grepl("%",vax_trunc$Fully_Vax),'%_age_group','population_count')

#Filling in missing values as a result of combined cells in the original excel file. 
vax_trunc<-vax_trunc%>%fill(age_group)

#splitting the dataframe for Not_Vax and Fully_Vax data
Not_Vax<-subset(vax_trunc,select=c('age_group','type','Not_Vax'))
Fully_Vax<-subset(vax_trunc,select=c('age_group','type','Fully_Vax'))

#The obvservation, population_count and %_age_group, is scattered across multiple rows. To tidy this up, pivot_wider() is used. 
Not_Vax<-Not_Vax %>%
  pivot_wider(names_from=type,values_from=Not_Vax)
Fully_Vax<-Fully_Vax %>%
  pivot_wider(names_from=type,values_from=Fully_Vax)

#Used the stringr package in order to extract the numerical values in table and to remove commas, percentage signs, etc. The values are converted from being a character to integer
Not_Vax$population_count<-as.numeric(str_replace_all(Not_Vax$population_count,",",""))
Not_Vax$`%_age_group`<-as.numeric(str_replace_all(Not_Vax$`%_age_group`,"%",""))
Fully_Vax$population_count<-as.numeric(str_replace_all(Fully_Vax$population_count,",",""))
Fully_Vax$`%_age_group`<-as.numeric(str_replace_all(Fully_Vax$`%_age_group`,"%",""))

#The total population is calculated per age group per Not_Vax and Fully_Vax people. 
Not_Vax_Total<-sum(Not_Vax$population_count)/(sum(Not_Vax$`%_age_group`)/100)
Fully_Vax_Total<-sum(Fully_Vax$population_count)/(sum(Fully_Vax$`%_age_group`)/100)
Total_Population<-Not_Vax_Total+Fully_Vax_Total

Total_Population
## [1] 7624368

Calculating Efficacy v. Severe Disease

  • The efficacy of the vaccination against severe disease is high for both age groups. It is best to caclualte the efficacy per group since the risk of severe disease is very different between age groups. Taking the rate of severe disease into consideration, the efficacy shows that it is doing a good job preventing hospitalization. Something to consider would be that there is no vaccination for children younger than 12 years old. So this may have an impact on the efficacy of the age group less than 50 years old. It could be possible that children, if vaccinated could have a better or worse immune response that is not being accounted for in this study.
severe_cases<-vax_raw[1:5,1:5]
#Drop columns containing data about vaccinated population.
severe_cases<-select(severe_cases,-2:-3)
#Removing every row with NA under the "severe cases" variable. 
severe_cases<-severe_cases%>%drop_na(Age,any_of("Severe Cases"))
#Renaming Column names and converting characters to numeric values.
colnames(severe_cases)<-c('age_group','Not_Vax_per100K','Fully_Vax_per100K')
severe_cases$Not_Vax_per100K<-as.numeric(severe_cases$Not_Vax_per100K)
severe_cases$Fully_Vax_per100K<-as.numeric(severe_cases$Fully_Vax_per100K)
#creating a new dataframe, "efficacy" to join data from Total_Population per age_group.
Not_Vax_count<-select(Not_Vax,-3)
colnames(Not_Vax_count)[2]<-c('Not_Vax_count')
Fully_Vax_count<-select(Fully_Vax,-3)
colnames(Fully_Vax_count)[2]<-c('Fully_Vax_count')
efficacy<-left_join(severe_cases,Not_Vax_count,by="age_group")
efficacy<-left_join(efficacy,Fully_Vax_count,by="age_group")
efficacy$Efficacy_v_SevereDisease<-1-((efficacy$Fully_Vax_per100K*100000/efficacy$Fully_Vax_count)/(efficacy$Not_Vax_per100K*100000/efficacy$Not_Vax_count))
efficacy<-efficacy %>% mutate_at(vars(Efficacy_v_SevereDisease),funs(round(.,3)))     
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
efficacy
## # A tibble: 2 × 6
##   age_group Not_Vax_per100K Fully_Vax_per100K Not_Vax_count Fully_Vax_count
##   <chr>               <dbl>             <dbl>         <dbl>           <dbl>
## 1 <50                    43                11       1116834         3501118
## 2 >50                   171               290        186078         2133516
## # … with 1 more variable: Efficacy_v_SevereDisease <dbl>

Comparing the rate of severe cases in unvaccinated individuals to that in vaccinated individuals.

  • The following section will combine the severe disease rates of all age groups for the fully vaccinated and not vaccinated. The total rate is weighted by total population per group and divded against the total population. The results show that the vaccinated group actually has a higher severe case rate, 117 cases per 100,000 people than the not vaccinated group 61 cases per 100,000 people. This goes against the assumption that the vaccinated group would have a much lower rate. However this is due to the large contributor of the severe disease rate of the age group that is older than 50 at 290 cases per 100,000 people. The risk is generally higher for this age group so therefore, considering the efficacy separately for each age group is more appropriate to gather better understanding of vaccination efficacy.
#Severe case rate per vaccination status group
Rate_Not_Vax<-(sum((efficacy$Not_Vax_per100K)*efficacy$Not_Vax_count)/sum(efficacy$Not_Vax_count))
Rate_Fully_Vax<-sum((efficacy$Fully_Vax_per100K)*efficacy$Fully_Vax_count)/sum(efficacy$Fully_Vax_count)
round(Rate_Not_Vax)
## [1] 61
round(Rate_Fully_Vax)
## [1] 117