Overview

In this assignment, I cleaned the untidy COVID data set provided and performed the analysis necessary to answer the questions provided in the spreadsheet.

Code

Data Cleaning and Preprocessing

Read the data from github:

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
covid.data<-read.csv("https://raw.githubusercontent.com/SaneSky109/DATA607/main/Week5_HW/israeli_vaccination_data_analysis_start.csv")

Subset the data into not vaccinated and fully vaccinated data frames:

# Subset the data

not.vax <- covid.data %>%
  select("Age","Population..","Severe.Cases")


full.vax <- covid.data %>%
  select("Age","X","X.1")


not.vax <- not.vax %>%
  slice(-1)


full.vax <- full.vax %>%
  slice(-1)

Extract the percentages using regular expressions:

# Extract Percentages

not.vax.percent <- unlist(str_extract_all(not.vax$Population.., "\\d*\\.\\d*%"))
full.vax.percent <- unlist(str_extract_all(full.vax$X, "\\d*\\.\\d*%"))

Remove the rows that held the percentages:

# Remove the rows that had the percentages

not.vax <- not.vax %>%
  slice(-2,-4)


full.vax <- full.vax %>%
  slice(-2,-4)

Add the percentages that were extracted into a new column:

# Add percentages to data as a column

not.vax <- not.vax %>%
  add_column(Percent.Pop = not.vax.percent)

full.vax <- full.vax %>%
  add_column(Percent.Pop = full.vax.percent)

Rename the columns so the data will be easier to understand:

# Rename the columns

not.vax <- not.vax %>%
  rename(Population = Population..)

full.vax <- full.vax %>%
  rename(Population = X) %>%
  rename(Severe.Cases = X.1)

Add a column as an indicator to being vaccinated or not:

# Add column to indicate not vax or full vax

not.vax.list <- c("not vax", "not vax")

full.vax.list <- c("full vax", "full vax")

not.vax <- not.vax %>%
  add_column(Vaccinated = not.vax.list)

full.vax <- full.vax %>%
  add_column(Vaccinated = full.vax.list)

Row bind the two data frames:

# Combine the two data frames

clean.covid.data<-rbind(not.vax,full.vax)

Adjust the data types to their proper form. They were originally all character data types:

# Adjust the data types

clean.covid.data$Percent.Pop <- str_remove_all(clean.covid.data$Percent.Pop, "%")
clean.covid.data$Percent.Pop <- as.numeric(clean.covid.data$Percent.Pop)

clean.covid.data$Age <- as.factor(clean.covid.data$Age)

clean.covid.data$Population<- str_remove_all(clean.covid.data$Population, ",")
clean.covid.data$Population <- as.numeric(clean.covid.data$Population)

clean.covid.data$Severe.Cases <- as.numeric(clean.covid.data$Severe.Cases)

clean.covid.data$Vaccinated <- as.factor(clean.covid.data$Vaccinated)

Cleaned Data set:

# Show clean data
clean.covid.data
##   Age Population Severe.Cases Percent.Pop Vaccinated
## 1 <50    1116834           43        23.3    not vax
## 2 >50     186078          171         7.9    not vax
## 3 <50    3501118           11        73.0   full vax
## 4 >50    2133516          290        90.4   full vax

Analysis

Do you have enough information to calculate the total population? What does this total population represent?

There is enough information to calculate the total population because the percentage of the population is given in the table. We can use this information to calculate the total population. The total population that is given provides all of the known information collected by the Israeli hospitals. The total population that I calculated represents the population if both of the percentages equaled 100%. See the total population below:

# Get the sum of the percentages and population

over.50 <- clean.covid.data %>%
  filter(Age == ">50")
under.50 <- clean.covid.data %>%
  filter(Age == "<50")

percent.under.50 <- sum(under.50$Percent.Pop)
percent.over.50 <- sum(over.50$Percent.Pop)

pop.under.50 <- sum(under.50$Population)
pop.over.50 <- sum(over.50$Population)

# Calculate the total population since the percentages do not equal 100% with the current numbers


total.pop.under.50 <- (pop.under.50 * 100) / percent.under.50
total.pop.over.50 <- (pop.over.50 * 100) / percent.over.50


total.pop <- total.pop.under.50 + total.pop.over.50

cat("The total population is: ", round(total.pop, digits = 0))
## The total population is:  7155090

Calculate the Efficacy vs. Disease; Explain your results

The results indicate that the vaccine is not effective in reducing severe cases. This is due to the negative efficacy vs disease value. This is an odd result due to the expectation that the vaccine should reduce the number of hospitalizations. See the Efficacy vs. Disease below:

# Efficacy vs. Disease = 1-(% fully vaccinated severe cases per 100K / % not vaxed severe cases per 100K)

full.vax <- clean.covid.data %>%
  filter(Vaccinated == "full vax")
not.vax <- clean.covid.data %>%
  filter(Vaccinated == "not vax")

severe.full.vax.percent <- sum(full.vax$Severe.Cases)/sum(clean.covid.data$Severe.Cases)
severe.not.vax.percent <- sum(not.vax$Severe.Cases)/sum(clean.covid.data$Severe.Cases)


EvsD <- 1 - (severe.full.vax.percent / severe.not.vax.percent)

cat("The Efficacy vs. Disease is: ", round(EvsD, digits = 5))
## The Efficacy vs. Disease is:  -0.40654

From your calculation of efficacy vs. disease, are you able to compare the rate of severe cases in unvaccinated individuals to that in vaccinated individuals?

Yes, it is possible to compare the rates. Fully vaccinated people were entering the hospitals at faster rate than their non vaccinated counterparts. See the rates below:

# Rates
severe.full.vax.percent <- sum(full.vax$Severe.Cases)/sum(clean.covid.data$Severe.Cases)
severe.not.vax.percent <- sum(not.vax$Severe.Cases)/sum(clean.covid.data$Severe.Cases)

cat("The rate of severe cases in unvaccinated individuals: ", severe.not.vax.percent)
## The rate of severe cases in unvaccinated individuals:  0.415534
cat("The rate of severe cases in vaccinated individuals: ", severe.full.vax.percent)
## The rate of severe cases in vaccinated individuals:  0.584466

Conclusions

After cleaning the untidy data, it was shocking to see that the rate of severe cases was more rapid in people who received full vaccination rather than people who were unvaccinated.