Picture of Israel at twilight with a mixture of old and new architecture

Israel by Landscape Photographer, Noam Chen, from blogs.timesofisrael.com



Overview

We are addressing the following questions using a chart of August 2021 data for Israeli COVID hospitalization rates for people, 50 and under, and over 50, for both un-vaccinated and fully vaccinated populations:

  • Do you have enough information to calculate the total population? What does this total population represent?
  • Calculate the Efficacy vs. Disease; Explain your results.
  • From your calculation of efficacy vs. disease, are you able to compare the rate of severe cases in unvaccinated individuals to that in vaccinated individuals?



R Setup

We are coding in the tidyverse.

# Load packages --------------------------------------
library(tidyverse)



Data

The data is a chart of August 2021 data for Israel COVID figures.

The original chart is located as an xlsx file from Andy Catlin’s github account.

The chart contains four groups:

  • Those 50 or younger and not fully vaccinated
  • Those 50 or younger and fully vaccinated
  • Those 51 or older and not fully vaccinated
  • Those 51 or older and fully vaccinated

The chart also contains for the four groups:

  • Population counts
  • Population percents
  • Severe case counts (hospitalizations)

Additionally the chart defines “Efficacy vs. severe disease” as follows but omits the numbers:

Efficacy vs. severe disease = 1 - (% fully vaxed severe cases per 100K / % not vaxed severe cases per 100K)



Data Import

Here we read the data from a csv file uploaded to my github account.

# Load data --------------------------------------
df <- read.csv("https://raw.githubusercontent.com/pkofy/DATA607/main/DATA607WK5Assignment.csv", stringsAsFactors = FALSE)



Data Cleanup Work

Originally I made the data frame below to get started on this project.

I then recorded the data from the original chart as a csv file and uploaded it to my github account to read into the project.

I could extend the project by instead saving the original chart as a csv file to be read, and then using tidyr or dplyr convert it into the format of my current csv file or the dataframe described below.

# Place holder to get bare bones for the assignment
Age <- c("<=50", "<=50", ">50", ">50")
VaxStat <- c("NotVax", "FullVax", "NotVax", "FullVax")
PopCount <- c(1116834, 3501118, 186078, 2133516)
PopPct <- c(0.233, 0.730, 0.079, 0.904)
SevereCases <- c(43, 11, 171, 290)

df_orig <- data.frame(Age, VaxStat, PopCount, PopPct, SevereCases)

print (df_orig)
##    Age VaxStat PopCount PopPct SevereCases
## 1 <=50  NotVax  1116834  0.233          43
## 2 <=50 FullVax  3501118  0.730          11
## 3  >50  NotVax   186078  0.079         171
## 4  >50 FullVax  2133516  0.904         290



Relevant Domain Knowledge

For this analysis I looked up additional information presented here:



Population of Israel

I compare the total population of Israel to the implied total population from the chart.

I’m estimating the total population of Israel in August 2021 to be 8,875,000 by interpolating the chart located here: worldpopulationreview.com.

# Create variable for estimated total population of Israel from worldpopulationreview.com
estTotalPop <- 8875000

If we sum population counts from the chart we get 6,937,546, a number 1,937,000 less than the estimate from worldpopulationreview.com.

# Sum of population counts
df %>% summarise(sum = sum(PopCount))
##       sum
## 1 6937546

If we sum population counts from the chart but factor in the population percents attributed to those counts (which total less than 100% each), then we get a higher number of 7,155,090, which is still 1,719,910 less than the estimate from worldpopulationreview.com.

# Create tibble with sum of population counts and percents by age
dfPop <- df %>% 
  group_by(Age) %>%
  summarise(sum = sum(PopCount), pct = sum(PopPct))

# Calculate implied true total population
dfPop <- mutate(dfPop, trueSum = sum / pct)

# Sum of implied true total population counts
dfPop %>% summarise(trueSum = sum(trueSum))
## # A tibble: 1 × 1
##    trueSum
##      <dbl>
## 1 7155090.

My guess is that the total population count in the table represents the people who are eligible to be vaccinated in Israel and don’t have an exemption. The 3.7% of missing 50 or younger people, and the 1.7% of missing 51 or older people, those could be the ones with exemptions. The estimated 1,720,000 remaining people are maybe children who are not eligible to be vaccinated.



Who is eligible to receive vaccinations

I believe at the time of the data children 12 and older were eligible to receive vaccinations.

This Times of Israel article states that children in Israel five and older were eligible to receive vaccinations starting January, 2022.

This December 10th, 2021, Brookings article states that children 12 and older were eligible to receive vaccinations and I assume that was true in August of 2021 as well.

The missing 1,720,000 is 19.38% of the population. This seems reasonably consistent with a figure from statista.com that 27.83% of Israel’s 2020 population were under 15, if roughly 8.45% of Israel’s population were between 12 and 14.

# Missing 1,720,000 / estimated Total population of 8,875,000 = 19.38%
(estTotalPop - sum(dfPop$trueSum)) / estTotalPop
## [1] 0.1937927



What does it mean to be fully vaccinated

Likely at the time being fully vaccinated meant two shots if you were under 40 years of age, and three shots if you were 40 years of age and older and had had your second dose at least five months before.

This Reuters article states Israel lowered the age for access to a booster shot from 40 to 30 years old on August 24th, 2021, if the person received their second dose at least five months before.



Analysis

Here we answer the first two main questions of the assignment.



Question 1

Do you have enough information to calculate the total population? What does this total population represent?

We don’t have enough information to calculate the total population from the chart alone. We can estimate a missing third group by backing into the missing population percentages, but it’s still not enough to reach the total population of Israel estimated at 8,875,000. In the Relevant Domain Knowledge section we show support that the larger missing population could be children under 12 who are not eligible to be vaccinated. And maybe the smaller missing population implied by the population percents not adding up to 100% are adults with vaccination exemptions.



Question 2

Calculate the Efficacy vs. Disease; Explain your results.

Here we calculate efficacy using the formula provided in the chart. We populate the chart with Incidence Rate of severe disease per 100K and then manually calculate the efficacy for both ages.

# Add a column with the incidence rate of severe disease per 100K
df <- mutate(df, IncidenceRate = SevereCases / PopCount * 100000)
# Calculate the vaccine efficacy vs. severe disease for those 50 or younger
efficacy_50orless <- (1 - (df$IncidenceRate[2] / df$IncidenceRate[1]))

# Calculate the vaccine efficacy vs. severe disease for those 51 or older
efficacy_51ormore <- (1 - (df$IncidenceRate[4] / df$IncidenceRate[3]))

The vaccine efficacy for those 50 or younger is 0.918397.

The vaccine efficacy for those 51 or older is 0.8520888.



Conclusion

Here we answer the last main question.



Question 3

From your calculation of efficacy vs. disease, are you able to compare the rate of severe cases in unvaccinated individuals to that in vaccinated individuals?

We are showing in this analysis that being fully vaccinated is efficacious against severe COVID-19 disease.

With more sophisticated data segmenting, or a double-blind health study, we could show differences in the efficacy depending on multiple age brackets or other risk factors such as smoking or obesity.



Future Efforts

I could have also used dplyr to create additional rows for the whole population and then calculated the vaccine efficacy for the whole population.



Source File

The R Markdown file for this document is saved here, github.com/pkofy/DATA607, with the name “DATA607WK5Assignment.rmd”.