Israeli Population research

Total Population

9.053 Million

source: https://en.wikipedia.org/wiki/Demographics_of_Israel

Who is eligible for vaccine?

Persons 12 years or older

source: https://govextra.gov.il/ministry-of-health/covid19-vaccine/en-covid19-vaccination-information/

How many are eligible for vaccine?

28% are under 14 years old. So we know that atleast 72% (1.00 - 0.28) of the population is eligible.

This would equate roughly to [9,053,000 * 0.72] = 6,518,160 eligible persons

load libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(stringr)

Reading and Cleaning Data

il_vax <- read_csv("https://raw.githubusercontent.com/man-of-moose/masters_607/main/homework_4/efficacy_data_clean.csv")
## New names:
## * `` -> ...3
## * `` -> ...5
## Rows: 5 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Age, Population %, ...3, Severe Cases, ...5, Efficacy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Inspect the table

il_vax

Fill in age column using fill()

This replaces any Null values with the value above it.

il_vax <- il_vax %>% fill(Age)

Remove the first row

This row does not contain any useful information. It’s better to remove, and to rename columns

il_vax <- il_vax[-1,]
colnames(il_vax) <- c("age","not_vax","vax","severe_no_vax","severe_vax","efficacy")
il_vax

Temporarily split out population count values with their percent counterparts

vac_percentages <- il_vax[-c(1,3),] %>%
                      select(age,not_vax,vax)

il_vax <- il_vax[c(1,3),]
vac_percentages
il_vax

use gather() function to convert il_vax and vac_percentages to long

First we will convert vac_percentaes to long

vac_percentages <- vac_percentages %>%
  gather("vax_status","age_pop_percent",2:3) %>%
  mutate(
    age_pop_percent = as.double(age_pop_percent)
  )
vac_percentages

Then we can use the same techniques to convert il_vax to long. Here we use gather twice, once for vax_status, and another for severe cases.

il_vax <- il_vax %>% 
  gather("vax_status","pop_count",2:3) %>%
  gather("severe_cases","severe_count",2:3)
il_vax

Remove redundant rows

Using gather() on severe_cases was important for making this dataframe long, but it also created a number of redundant rows. We use filter to choose only rows where vax_status matches severe_cases (vax -> vax, or no_vax -> no_vax)

We can then remove the column severe_cases, because severe_count holds the actual value we care about.

We can finally create a new column which is the proportion of severe_cases to total group population.

il_vax <- il_vax %>%
  filter((vax_status=="not_vax" & severe_cases=="severe_no_vax") | 
          (vax_status=="vax" & severe_cases=="severe_vax")) %>%
  select(age,vax_status,pop_count,severe_count) %>%
  mutate(
    pop_count = as.integer(pop_count),
    severe_count = as.integer(severe_count),
    severe_count_prop = severe_count / pop_count
  )
il_vax

Merge vax_table and vac_percentages

We can now combine back in the population vaccination percentages by age, captured in the vax_percentages table.

il_vax <- left_join(il_vax,vac_percentages,by=c("age","vax_status")) %>%
              select(age, 
                     vax_status, 
                     pop_count, 
                     age_pop_percent, 
                     severe_count, 
                     severe_count_prop)
il_vax

Questions

Do you have enough information to calculate total population? What does this total population represent?

Yes, we can calculate the population of “individuals eligible for vaccination in Israel” based on the data that we have here.

calculate_total_population <- function(age_group, data) {
  temp_table <- data %>% filter(age==age_group)
  pop_count_sum <- sum(temp_table$pop_count)
  age_pop_percent_sum <- sum(temp_table$age_pop_percent)
  
  return(pop_count_sum/age_pop_percent_sum)
}
calculate_total_population("<50",il_vax) + 
  calculate_total_population(">50",il_vax)
## [1] 7155090

This does not match our expected population of 6.5M for total eligible voters shown at the start of this assignment (and sourced from wikipedia). However, that statistic was generated using the proportion of people ages 0-14, where for this problem we are actually interested in the proportion of people ages 0-12.

Calculate the Efficacy vs Disease; Explain your results

In order to keep our data tidy, we need to create a new table for efficacy. We can always join this table to the other one later.

efficacy_table <- tibble(
  age = unique(il_vax$age)
)
calculate_efficacy <- function(age_group, table) {
  vax_prop <- table %>%
                filter(age==age_group & vax_status=="vax") %>%
                .$severe_count_prop
  
  no_vax_prop <- table %>%
                filter(age==age_group & vax_status=="not_vax") %>%
                .$severe_count_prop
  
  ret <- 1-(vax_prop/no_vax_prop)
  
  return(ret)
}
efficacy_table <- efficacy_table %>% 
  mutate(efficacy = calculate_efficacy(age,il_vax))
efficacy_table

As we can see from the above efficacy_table, vaccines are incredibly effective for both age groups. Interpreting the above data, we can say that:

  • vaccines prevent 91% of severe cases for individuals under 50 years of age.
  • vaccines prevent 85% of severe cases for individuals over 50 years of age.

From your calculation of efficacy vs disease, are you able to compare the rate of severe cases in unvaccinted individuals to that in vaccinated individuals?

Yes, by comparing the proportions of [severe_count / pop_count] between vaccinated and unvaccinated peoples we can see the difference in severe cases rate.

under_50_not_vax_severe_proportion <- il_vax %>%
  filter(age=="<50", vax_status=="not_vax") %>%
  .$severe_count_prop

under_50_vax_severe_proportion <- il_vax %>%
  filter(age=="<50", vax_status=="vax") %>%
  .$severe_count_prop


# comparing difference in proportions

under_50_not_vax_severe_proportion / under_50_vax_severe_proportion
## [1] 12.25445

In the under-50 bracket, unvaccinated people end up with severe cases at a rate 12 times greater than vaccinated people.

over_50_not_vax_severe_proportion <- il_vax %>%
  filter(age==">50", vax_status=="not_vax") %>%
  .$severe_count_prop

over_50_vax_severe_proportion <- il_vax %>%
  filter(age==">50", vax_status=="vax") %>%
  .$severe_count_prop


# comparing difference in proportions

over_50_not_vax_severe_proportion / over_50_vax_severe_proportion
## [1] 6.760814

In the over-50 bracket, unvaccinated people end up with severe cases at a rate 6 times greater than vaccinated people.