Introduction

This is a creation and analysis of Israeli Vaccination data for Covid 19

Methods

  1. First we need to recreate the table of vaccination data
covid_df <-
  data.frame(
    Age = as.character(),
    "Population.Not.Vaxed" = as.character(),
    "Population.Fully.Vaxed" = as.character(),
    "Severe.Cases.Not.Vax.per.100K" = as.character(),
    "Severe.Cases.Fully.Vax.per.100K" = as.character(),
    "Efficacy.vs.severe.disease" = as.character(), row.names = "Age"
  )
  1. Populate the rows of our csv, with a primary row based on age stratification
covid_df['<50',]<- c('1,116,834', '3,501,118', '43', '11', NA)
covid_df['<50%',]<- c('23.3%', '73.0%', NA, NA, NA)
covid_df['>50',]<- c('186,078', '2,133,516', '171', '290', NA)
covid_df['>50%',]<- c('7.9%%', '90.4%', NA, NA, NA)

covid_df %>% kbl %>% kable_classic
Population.Not.Vaxed Population.Fully.Vaxed Severe.Cases.Not.Vax.per.100K Severe.Cases.Fully.Vax.per.100K Efficacy.vs.severe.disease
<50 1,116,834 3,501,118 43 11 NA
<50% 23.3% 73.0% NA NA NA
>50 186,078 2,133,516 171 290 NA
>50% 7.9%% 90.4% NA NA NA
  1. Save out our csv that we are going to commit to github
#write.table(covid_df, "israeli_vaccination_data.csv", row.names=TRUE, sep = ",")
  1. Load csv from github with readr
file_path <- "https://raw.githubusercontent.com/catfoodlover/Data607/main/israeli_vaccination_data.csv"
covid_df2 <- read_csv(file_path, show_col_types = FALSE)
  1. readr saves the data as a tibble which doesn’t support row names so we need to fix our data structrue
covid_df2 <- as.data.frame(covid_df2)
rownames(covid_df2) <- covid_df2[, 1]
## Warning: One or more parsing issues, see `problems()` for details
names_list <- names(covid_df2)
covid_df2[,1] <- NULL

covid_df2 <- covid_df2 %>% separate(Efficacy.vs.severe.disease, sep = ",", c("temp1", "temp2"))

colnames(covid_df2) <- names_list

covid_df2[covid_df2 == 'NA'] <- NA
  1. We want to get the information in those secondary rows into columns
covid_df2 <-
  covid_df2 %>% mutate(age_group = case_when(
    str_detect(row.names(.), "<50") ~ "<50",
    str_detect(row.names(.), ">50") ~ ">50"
  )) %>% group_by(age_group) %>% mutate(
    percent.not.vaxed = str_extract(Population.Not.Vaxed, ".*%"),
    percent.vaxed  = str_extract(Population.Fully.Vaxed, ".*%")
  ) %>% fill(percent.not.vaxed, .direction = c("up")) %>% fill(percent.vaxed, .direction = c("up")) %>% ungroup(.) %>% filter(!is.na(Severe.Cases.Not.Vax.per.100K))

covid_df2 %>% kbl() %>% kable_classic()
Population.Not.Vaxed Population.Fully.Vaxed Severe.Cases.Not.Vax.per.100K Severe.Cases.Fully.Vax.per.100K Efficacy.vs.severe.disease age_group percent.not.vaxed percent.vaxed
1,116,834 3,501,118 43 11 NA <50 23.3% 73.0%
186,078 2,133,516 171 290 NA >50 7.9%% 90.4%
  1. Now we can calculate our Efficacy vs severe disease
covid_df2 <-
  covid_df2 %>% mutate(Efficacy.vs.severe.disease = 1 - ((as.numeric(Severe.Cases.Fully.Vax.per.100K)/100000)/(as.numeric(Severe.Cases.Not.Vax.per.100K)/100000)))

Results

  1. Do we have enough information to calculate the total population?

Yes we do, we know what percentages for both the under and over 50

  • We divide either treatment arm by the percent population they represent to get the total population
  • We sum the totals for both age groups to get the overall total population
(under_50_pop <- 3501118/.73)
## [1] 4796052
(over_50_pop <- 2113516/.904)
## [1] 2337960
(total_pop <- under_50_pop + over_50_pop)
## [1] 7134012

This population represents people in Israel old enough to get vaccinated (12 years and older), total pop ~8.8 million

  1. We have already calculated the efficacy
temp <- covid_df2 %>% group_by(age_group) %>% summarise(Efficacy.vs.severe.disease)

kbl(temp) %>% kable_classic()
age_group Efficacy.vs.severe.disease
<50 0.7441860
>50 -0.6959064
  • We can see that vaccination has an efficacy of 74% in the under 50 group
  • The vaccine has a surprising -69% efficacy in the over 50 group
  • We will dig into explaining why this is in the next question
  1. From your calculation are you able to compare the rates of severe disease in unvaccinated and vaccinated individuals?
covid_df2 %>% select(age_group, Population.Not.Vaxed, percent.not.vaxed, Population.Fully.Vaxed, percent.vaxed, Efficacy.vs.severe.disease) %>% kbl() %>% kable_classic()
age_group Population.Not.Vaxed percent.not.vaxed Population.Fully.Vaxed percent.vaxed Efficacy.vs.severe.disease
<50 1,116,834 23.3% 3,501,118 73.0% 0.7441860
>50 186,078 7.9%% 2,133,516 90.4% -0.6959064

My calculation doesn’t allow me to measure vaccine efficacy for a couple of reasons.

  • Our percentages don’t add up to 100% so I’m assuming partially vaccinated people and people vaccinated for less than 14 days are excluded?
  • Those people should be in another category
  • It also unclear to me what ‘per 100k means’, does it mean the entire cohort (>50/<50) or 100k within that age group and treatment arm?
  • The fact that such a high percentage of older Israelis have been vaccinated it’s not surprising that there are more breakthrough infections than infections in the unvaccinated