Vaccination Data

Jeff Parks

2021-09-26

Assignment

Our source data is a COVID-19 table of Israeli hospitalization rates from August 2021. These data are labeled for patient age and for vaccination status.

img: data preview

With this data and the inclusion of some broader domain knowledge, we will attempt to answer the following questions:

  1. Do you have enough information to calculate the total population? What does this total population represent?

  2. Calculate the Efficacy vs. Disease; explain your results.

  3. From your calculation of efficacy vs. disease, are you able to compare the rate of severe cases in un-vaccinated individuals to that in vaccinated individuals?


Domain Knowledge

  • Based on recent demographic data, the total population of Israel in August 2021 was approximately 9.3 Million. 1

  • As of February 2021, all Israelis aged 16 and over were eligible to receive COVID-19 vaccinations. 2

  • The Israeli population skews quite young, with approximately 28% of all Israelis aged 14 and under. 3

  • “Full vaccination” is defined as having received two injections of an approved COVID-19 vaccine. 4


Data Preparation

# import libraries
library(tidyverse)
library(knitr)

# load data
df <- read_csv('https://raw.githubusercontent.com/jefedigital/cuny-data-607-projects/main/vaccination-data/data/israel_vax_data.csv', skip=1)
df <- select(df,!last_col()) # drop last col

# bind alternating rows
df_odd <- filter(df, row_number() %% 2 == 0)
df_even <- filter(df, row_number() %% 2 == 1)
df <- bind_cols(df_even, df_odd)

# drop extra columns
df <- select(df,-6,-9,-10) 

# relabel
names(df) <- c('age','count_non','count_full','severe_non',
               'severe_full','pct_non','pct_full')

# remove '%' characters, convert all cols to numeric
df$pct_non <- str_replace(df$pct_non,'%','')
df$pct_full <- str_replace(df$pct_full,'%','')
df <- type_convert(df)

# convert to pct
df$pct_non <- round(df$pct_non / 100,4) 
df$pct_full <- round(df$pct_full / 100,4)

# final dataframe
df <- df %>% 
  mutate(pct_partial = 1 - (pct_non + pct_full)) %>%
  mutate(count_partial = round(count_full / pct_full * pct_partial)) %>% 
  mutate(efficacy = 1- ((severe_full/10^4) / (severe_non/10^4)))

# display the dataframe
kable(df)
age count_non count_full severe_non severe_full pct_non pct_full pct_partial count_partial efficacy
<50 1116834 3501118 43 11 0.233 0.730 0.037 177454 0.7441860
>50 186078 2133516 171 290 0.079 0.904 0.017 40121 -0.6959064
# some additional variables for analysis
total_pop <- 9227700

total_non <- sum(df$count_non)
total_full <- sum(df$count_full)
total_partial <- sum(df$count_partial)

total_eligible <- sum(total_non, total_full, total_partial)

total_ineligible <- total_pop - total_eligible
pct_ineligible <- round(total_ineligible / total_pop,4)

Discussion

Do you have enough information to calculate the total population? What does this total population represent?

In examining the table, we notice that the provided totals of un-vaccinated and fully vaccinated as a percentage of the population does not total 1. The under 50 difference is -3.7% and over 50 is -1.7%.

We assume this represents people that were age-eligible, but only partially vaccinated at the time of the survey (having received only one injection.)

We can estimate this ‘total vaccine eligible population’ at 7155121. Subtracting this from the total Israeli population yields 2072579, representing about 22.5% of all Israelis, and a round estimate of all age-ineligible citizens 14 and under, and/or those who are otherwise vaccine-ineligible.

Calculate the Efficacy vs. Disease; explain your results.

Are you able to compare the rate of severe cases in un-vaccinated individuals to that in vaccinated individuals?

Efficacy vs. Disease is defined as 1 - (% fully vaxed severe cases per 100K / % not vaxed severe cases per 100K), and and additional column is added to the dataframe to calculate this rate.

We observe very different Efficacy rates per age group. The under-50 group produces a 75% Efficacy rate with far fewer severe cases per 100k among the vaccinated. However the story is quite different for the over-50 group, with 69% more severe cases per 100k among the vaccinated.

It would be difficult to explain these results without a great deal more of information - there could be a number of confounding variables at play including rates of co-morbidity and overall health between age groups, rates of hospitalization among vaccinated and non-vaccinated groups, and others.