CUNY SPS DATA607 HW5

Name: Chinedu Onyeka

Date: September 25th, 2021

Load the required libraries

library(tidyverse)
library(stringr)

Read the file

#Read the file
url <- "https://raw.githubusercontent.com/chinedu2301/DATA607-Data-Acquisition-and-Management/main/israeli_vaccination_data_analysis_start.csv"
israel_vac_data <-  read_csv(url)
israel_vac_data <- israel_vac_data %>% slice(1:6)

Look at the data

israel_vac_data
## # A tibble: 6 x 6
##   Age   `Population %` ...3           `Severe Cases`      ...5       Efficacy   
##   <chr> <chr>          <chr>          <chr>               <chr>      <chr>      
## 1 <NA>  "Not Vax\n%"   "Fully Vax\n%" "Not Vax\nper 100K~ "Fully Va~ vs. severe~
## 2 <50   "1,116,834"    "3,501,118"    "43"                "11"       <NA>       
## 3 <NA>  "23.3%"        "73.0%"         <NA>                <NA>      <NA>       
## 4 >50   "186,078"      "2,133,516"    "171"               "290"      <NA>       
## 5 <NA>  "7.9%"         "90.4%"         <NA>                <NA>      <NA>       
## 6 <NA>   <NA>           <NA>           <NA>                <NA>      <NA>

Clean the Data

# Replace headers (column names) with more meaningful names
headers <- c("Age", "Not_Vacc", "Full_Vacc", "Sev_Not_Vacc_per100k", "Sev_Full_Vacc_per100k", "Efficacy vs. Severe" )
colnames(israel_vac_data) <- headers
israel_vac_data
## # A tibble: 6 x 6
##   Age   Not_Vacc     Full_Vacc      Sev_Not_Vacc_pe~ Sev_Full_Vacc_p~ `Efficacy vs. S~
##   <chr> <chr>        <chr>          <chr>            <chr>            <chr>           
## 1 <NA>  "Not Vax\n%" "Fully Vax\n%" "Not Vax\nper 1~ "Fully Vax\nper~ vs. severe dise~
## 2 <50   "1,116,834"  "3,501,118"    "43"             "11"             <NA>            
## 3 <NA>  "23.3%"      "73.0%"         <NA>             <NA>            <NA>            
## 4 >50   "186,078"    "2,133,516"    "171"            "290"            <NA>            
## 5 <NA>  "7.9%"       "90.4%"         <NA>             <NA>            <NA>            
## 6 <NA>   <NA>         <NA>           <NA>             <NA>            <NA>

We actually only need rows 2 to 5 of this dataset.

israel_vac_data <- israel_vac_data %>% slice(2:5)
israel_vac_data
## # A tibble: 4 x 6
##   Age   Not_Vacc  Full_Vacc Sev_Not_Vacc_per~ Sev_Full_Vacc_pe~ `Efficacy vs. S~
##   <chr> <chr>     <chr>     <chr>             <chr>             <chr>           
## 1 <50   1,116,834 3,501,118 43                11                <NA>            
## 2 <NA>  23.3%     73.0%     <NA>              <NA>              <NA>            
## 3 >50   186,078   2,133,516 171               290               <NA>            
## 4 <NA>  7.9%      90.4%     <NA>              <NA>              <NA>

Subset the israel_vac_data dataset to obtain only the population vaccinated rows

pop_vac <- israel_vac_data[c(1,3),]
pop_vac
## # A tibble: 2 x 6
##   Age   Not_Vacc  Full_Vacc Sev_Not_Vacc_per~ Sev_Full_Vacc_pe~ `Efficacy vs. S~
##   <chr> <chr>     <chr>     <chr>             <chr>             <chr>           
## 1 <50   1,116,834 3,501,118 43                11                <NA>            
## 2 >50   186,078   2,133,516 171               290               <NA>

Subset the israel_vac_data dataset to obtain only the percent vaccinated rows

pct_vac <- israel_vac_data[c(2,4),]
pct_vac_headers <- c("Age", "Not_Vacc_pct", "Full_Vacc_pct", "Sev_Not_Vacc_per100k_pct", "Sev_Full_Vacc_per100k_pct", "Efficacy vs. Severe")
colnames(pct_vac) <- pct_vac_headers
pct_vacc <- pct_vac %>% select(Not_Vacc_pct:Sev_Full_Vacc_per100k_pct)
pct_vacc
## # A tibble: 2 x 4
##   Not_Vacc_pct Full_Vacc_pct Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k_pct
##   <chr>        <chr>         <chr>                    <chr>                    
## 1 23.3%        73.0%         <NA>                     <NA>                     
## 2 7.9%         90.4%         <NA>                     <NA>

Combine the two dataframes (pop_vac and pct_vacc) to get a dataframe of the israel vaccination data

israel_vaccination_data <- cbind(pop_vac, pct_vacc)
israel_vaccination_data
##   Age  Not_Vacc Full_Vacc Sev_Not_Vacc_per100k Sev_Full_Vacc_per100k
## 1 <50 1,116,834 3,501,118                   43                    11
## 2 >50   186,078 2,133,516                  171                   290
##   Efficacy vs. Severe Not_Vacc_pct Full_Vacc_pct Sev_Not_Vacc_per100k_pct
## 1                <NA>        23.3%         73.0%                     <NA>
## 2                <NA>         7.9%         90.4%                     <NA>
##   Sev_Full_Vacc_per100k_pct
## 1                      <NA>
## 2                      <NA>

Israel Vaccination rate

isr <- israel_vaccination_data %>% select(Age, Not_Vacc, Not_Vacc_pct, Full_Vacc, Full_Vacc_pct, Sev_Not_Vacc_per100k, Sev_Not_Vacc_per100k_pct, Sev_Full_Vacc_per100k, Sev_Full_Vacc_per100k_pct, `Efficacy vs. Severe`)
isr
##   Age  Not_Vacc Not_Vacc_pct Full_Vacc Full_Vacc_pct Sev_Not_Vacc_per100k
## 1 <50 1,116,834        23.3% 3,501,118         73.0%                   43
## 2 >50   186,078         7.9% 2,133,516         90.4%                  171
##   Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k Sev_Full_Vacc_per100k_pct
## 1                     <NA>                    11                      <NA>
## 2                     <NA>                   290                      <NA>
##   Efficacy vs. Severe
## 1                <NA>
## 2                <NA>

Remove the non-numeric symbols

isr$Not_Vacc_pct <- isr$Not_Vacc_pct %>% str_remove_all(pattern = "%")
isr$Full_Vacc_pct <- isr$Full_Vacc_pct %>% str_remove_all(pattern = "%")
isr$Not_Vacc <- isr$Not_Vacc %>% str_remove_all(pattern = ",")
isr$Full_Vacc <- isr$Full_Vacc %>% str_remove_all(pattern = ",")
isr
##   Age Not_Vacc Not_Vacc_pct Full_Vacc Full_Vacc_pct Sev_Not_Vacc_per100k
## 1 <50  1116834         23.3   3501118          73.0                   43
## 2 >50   186078          7.9   2133516          90.4                  171
##   Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k Sev_Full_Vacc_per100k_pct
## 1                     <NA>                    11                      <NA>
## 2                     <NA>                   290                      <NA>
##   Efficacy vs. Severe
## 1                <NA>
## 2                <NA>

Convert the columns to numeric

isr_age <- isr %>% select(Age)
dat <- isr %>% select(-Age)
dat_df <- unlist(sapply(dat, as.numeric)) #convert all the columns except the Age column to numeric
dat_daf <- as.data.frame(dat_df)
israel_Vax <- cbind(isr_age, dat_daf)
israel_Vax
##   Age Not_Vacc Not_Vacc_pct Full_Vacc Full_Vacc_pct Sev_Not_Vacc_per100k
## 1 <50  1116834         23.3   3501118          73.0                   43
## 2 >50   186078          7.9   2133516          90.4                  171
##   Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k Sev_Full_Vacc_per100k_pct
## 1                       NA                    11                        NA
## 2                       NA                   290                        NA
##   Efficacy vs. Severe
## 1                  NA
## 2                  NA

Compute the Sev_Not_Vacc_per100k_pct and Sev_Full_Vacc_per100k_pct

israel_Vax <- israel_Vax %>% mutate(Sev_Not_Vacc_per100k_pct = round((Sev_Not_Vacc_per100k/Not_Vacc)*100000,1), 
                             Sev_Full_Vacc_per100k_pct = round((Sev_Full_Vacc_per100k/Full_Vacc)*100000,1))
israel_Vax
##   Age Not_Vacc Not_Vacc_pct Full_Vacc Full_Vacc_pct Sev_Not_Vacc_per100k
## 1 <50  1116834         23.3   3501118          73.0                   43
## 2 >50   186078          7.9   2133516          90.4                  171
##   Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k Sev_Full_Vacc_per100k_pct
## 1                      3.9                    11                       0.3
## 2                     91.9                   290                      13.6
##   Efficacy vs. Severe
## 1                  NA
## 2                  NA

Compute the Efficacy vs. Severe
Efficacy vs. Severe = 1 - (Sev_Full_Vacc_per100k_pct/Sev_Not_Vacc_per100k_pct)

israel_Vaxx <- israel_Vax %>% mutate(`Efficacy vs. Severe` = round((1 - (Sev_Full_Vacc_per100k_pct/Sev_Not_Vacc_per100k_pct)),3)*100)

israel_Vaxx
##   Age Not_Vacc Not_Vacc_pct Full_Vacc Full_Vacc_pct Sev_Not_Vacc_per100k
## 1 <50  1116834         23.3   3501118          73.0                   43
## 2 >50   186078          7.9   2133516          90.4                  171
##   Sev_Not_Vacc_per100k_pct Sev_Full_Vacc_per100k Sev_Full_Vacc_per100k_pct
## 1                      3.9                    11                       0.3
## 2                     91.9                   290                      13.6
##   Efficacy vs. Severe
## 1                92.3
## 2                85.2

Question 1: Do you have enough information to calculate the total population? What does this total population represent?

Solution 1:

Compute population:

ques1 <- israel_Vaxx %>% select(Age, Not_Vacc, Full_Vacc)
ques1 <- ques1 %>% mutate(Population = Not_Vacc + Full_Vacc)
ques1_pop_pct <- israel_Vaxx %>% transmute(Pop_pct = Not_Vacc_pct + Full_Vacc_pct)
ques1 <- cbind(ques1, ques1_pop_pct)
#compute estimated population per age group
est_pop <- ques1 %>% transmute(Est_population = Population/(Pop_pct/100))
ques1 <- cbind(ques1, est_pop)
ques1
##   Age Not_Vacc Full_Vacc Population Pop_pct Est_population
## 1 <50  1116834   3501118    4617952    96.3        4795381
## 2 >50   186078   2133516    2319594    98.3        2359709

Compute the total estimated population:

Est_total_population <- round(sum(ques1$Est_population), 0)
paste0("The estimated total population from the given data is ", Est_total_population)
## [1] "The estimated total population from the given data is 7155090"

Background Knowledge:

  1. Israel’s total population is about 9216900 according to WorldBank data.

  2. Eligible to receive vaccine in Israel: From february 2021, only those who are 16 years and older could get covid vaccinated in Israel, but by August 2021, the covid vaccines have been made available to those age 12 and older.

  3. To be fully vaccinated in Israel means that 14 days have elapsed after completing the series (2 doses) of either the Pfizer or Moderna vaccines as those are the only covid-19 vaccines currently approved in Israel.

From the background information, we see that the total population of Israel is about 9.2million while the total population computed from the vaccination data is about 7.2million. There is a discrepancy of about 2million(about 22% of Israel’s population) not accounted for in the vaccination data provided, and there is no way for us to calculate that extra 2million from the given data. Hence, the given data does not provide enough information to calculate the total population of Israel. Also, we do not know if those who are partially vaccinated were not included in this study or if they have been counted as Not Vaccinated

In my opinion, the total population of about 7.2million calculated from the given data would represent the total population of those who are eligible to receive the vaccine. The data for those 50 and below may not have accounted for those below 12 or 16 years old who are not eligible to receive the covid-19 vaccine. Hence, the missing 2million (22% of the total population of Israel).

According to the Jewish Virtual library, Israel has a relatively younger population compared to other Western countries with about 28% of the population between 0 - 14 years old. This information provides more insight on the missing 22% from the given data which could further support that it would be the population aged 0 - 12 years old who are not eligible to receive the vaccine.

Question 2: Calculate the Efficacy vs. Disease; Explain your results:

Solution 2:
From the values computed above:

israel_efficacy_severe <- israel_Vaxx %>% select(Age, Not_Vacc, Full_Vacc, `Efficacy vs. Severe`)
israel_efficacy_severe
##   Age Not_Vacc Full_Vacc Efficacy vs. Severe
## 1 <50  1116834   3501118                92.3
## 2 >50   186078   2133516                85.2

This means that the Efficacy vs. Severe is higher for those below 50 (92.3%) compared to those above 50 (85.2%) which implies that the vaccine is more effective for those 50 and below.

Question 3: From your calculation of efficacy vs. disease, are you able to compare the rate of severe cases in unvaccinated to vaccinated individuals?

Solution 3:
Yes I am able to compare the rate of severe cases in unvaccinated individuals to vaccinated individuals. The Efficacy vs. Severe basically represents the percent reduction in severe infection in the vaccinated group relative to the unvaccinated. From the severe rates, we see that those who are unvaccinated are more likely to have severe cases (hospitalized) compared to those who are vaccinated for both age groups.