Looking at the efficacy of COVID-19 vaccines across multiple demographics, time, and vaccination status (none, 1 vaccine, 2 vaccines) measured in deaths per 100,000 cases.
covid <- read.csv("https://raw.githubusercontent.com/evelynbartley/Data-607/main/Rates_of_COVID-19_Cases_or_Deaths_by_Age_Group_and_Updated__Bivalent__Booster_Status_20240302.csv")
tibble(covid)
## # A tibble: 2,654 × 20
## outcome month mmwr_week age_group vaccination_status vaccinated_with_outc…¹
## <chr> <chr> <int> <chr> <chr> <int>
## 1 case OCT 20… 202140 12-17 vaccinated 2332
## 2 case OCT 20… 202140 12-17 vax with updated … NA
## 3 case OCT 20… 202140 18-29 vaccinated 9571
## 4 case OCT 20… 202140 18-29 vax with updated … NA
## 5 case OCT 20… 202140 30-49 vaccinated 25229
## 6 case OCT 20… 202140 30-49 vax with updated … NA
## 7 case OCT 20… 202140 50-64 vaccinated 19123
## 8 case OCT 20… 202140 50-64 vax with updated … NA
## 9 case OCT 20… 202140 65-79 vaccinated 14184
## 10 case OCT 20… 202140 65-79 vax with updated … NA
## # ℹ 2,644 more rows
## # ℹ abbreviated name: ¹vaccinated_with_outcome
## # ℹ 14 more variables: vaccinated_population <dbl>,
## # unvaccinated_with_outcome <int>, unvaccinated_population <dbl>,
## # crude_vax_ir <dbl>, crude_unvax_ir <dbl>, crude_irr <dbl>,
## # age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>, age_adj_irr <dbl>,
## # monthly_age_adj_vax_ir <dbl>, monthly_age_adj_unvax_ir <dbl>, …
So this dataset is pretty overwhelming. To look at efficacy of Covid vaccines, I want to look at covid deaths for each age range.
I only want to look at the cases that resulted in death.
deaths <- covid %>%
filter(outcome == "death")
tibble(deaths)
## # A tibble: 1,300 × 20
## outcome month mmwr_week age_group vaccination_status vaccinated_with_outc…¹
## <chr> <chr> <int> <chr> <chr> <int>
## 1 death OCT 20… 202140 12-17 vaccinated 0
## 2 death OCT 20… 202140 12-17 vax with updated … NA
## 3 death OCT 20… 202140 18-29 vaccinated 0
## 4 death OCT 20… 202140 18-29 vax with updated … NA
## 5 death OCT 20… 202140 30-49 vaccinated 17
## 6 death OCT 20… 202140 30-49 vax with updated … NA
## 7 death OCT 20… 202140 50-64 vaccinated 97
## 8 death OCT 20… 202140 50-64 vax with updated … NA
## 9 death OCT 20… 202140 65-79 vaccinated 278
## 10 death OCT 20… 202140 65-79 vax with updated … NA
## # ℹ 1,290 more rows
## # ℹ abbreviated name: ¹vaccinated_with_outcome
## # ℹ 14 more variables: vaccinated_population <dbl>,
## # unvaccinated_with_outcome <int>, unvaccinated_population <dbl>,
## # crude_vax_ir <dbl>, crude_unvax_ir <dbl>, crude_irr <dbl>,
## # age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>, age_adj_irr <dbl>,
## # monthly_age_adj_vax_ir <dbl>, monthly_age_adj_unvax_ir <dbl>, …
#separate the month column so that each column only has one variable
deaths1 <- deaths %>%
separate(month, into = c("Month", "Year"))
tibble(deaths1)
## # A tibble: 1,300 × 21
## outcome Month Year mmwr_week age_group vaccination_status
## <chr> <chr> <chr> <int> <chr> <chr>
## 1 death OCT 2021 202140 12-17 vaccinated
## 2 death OCT 2021 202140 12-17 vax with updated booster
## 3 death OCT 2021 202140 18-29 vaccinated
## 4 death OCT 2021 202140 18-29 vax with updated booster
## 5 death OCT 2021 202140 30-49 vaccinated
## 6 death OCT 2021 202140 30-49 vax with updated booster
## 7 death OCT 2021 202140 50-64 vaccinated
## 8 death OCT 2021 202140 50-64 vax with updated booster
## 9 death OCT 2021 202140 65-79 vaccinated
## 10 death OCT 2021 202140 65-79 vax with updated booster
## # ℹ 1,290 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## # vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## # unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## # crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## # age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## # monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …
I only want to focus on cases where the person was vaccinated with at least a primary series of vaccines.
#select cases that have a vaccinated status
deaths2 <- deaths1 %>%
filter(vaccination_status == "vaccinated")
tibble(deaths2)
## # A tibble: 650 × 21
## outcome Month Year mmwr_week age_group vaccination_status
## <chr> <chr> <chr> <int> <chr> <chr>
## 1 death OCT 2021 202140 12-17 vaccinated
## 2 death OCT 2021 202140 18-29 vaccinated
## 3 death OCT 2021 202140 30-49 vaccinated
## 4 death OCT 2021 202140 50-64 vaccinated
## 5 death OCT 2021 202140 65-79 vaccinated
## 6 death OCT 2021 202140 80+ vaccinated
## 7 death OCT 2021 202140 all_ages vaccinated
## 8 death OCT 2021 202141 12-17 vaccinated
## 9 death OCT 2021 202141 18-29 vaccinated
## 10 death OCT 2021 202141 30-49 vaccinated
## # ℹ 640 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## # vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## # unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## # crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## # age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## # monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …
# I don't need the summary row "all_ages" so I filter the dataframe to include everything BUT that. I also don't have data for the 0.5-4 and 5-11 age groups for each year so I will not include that in my analysis.
deaths3 <- deaths2 %>%
filter(!age_group == "all_ages") %>%
filter(!age_group == "0.5-4") %>%
filter(!age_group == "5-11")
tibble(deaths3)
## # A tibble: 468 × 21
## outcome Month Year mmwr_week age_group vaccination_status
## <chr> <chr> <chr> <int> <chr> <chr>
## 1 death OCT 2021 202140 12-17 vaccinated
## 2 death OCT 2021 202140 18-29 vaccinated
## 3 death OCT 2021 202140 30-49 vaccinated
## 4 death OCT 2021 202140 50-64 vaccinated
## 5 death OCT 2021 202140 65-79 vaccinated
## 6 death OCT 2021 202140 80+ vaccinated
## 7 death OCT 2021 202141 12-17 vaccinated
## 8 death OCT 2021 202141 18-29 vaccinated
## 9 death OCT 2021 202141 30-49 vaccinated
## 10 death OCT 2021 202141 50-64 vaccinated
## # ℹ 458 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## # vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## # unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## # crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## # age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## # monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …
Now that I know all of the data I have is a “death” outcome, I can get rid of that column, because it is the same value for all cases. I want to construct my analysis based on . I also want to get rid of some of the other irrelevant columns. I want to focus on the columns “vaccinated_with_outcome” and “unvaccinated_with_outcome” for analysis, and change the names of those columns.
# subset the columns I want to keep
deaths4 <- deaths3 %>%
select(Year, Age = age_group, Vaccinated_Death = vaccinated_with_outcome, Unvaccinated_Death = unvaccinated_with_outcome)
tibble(deaths4)
## # A tibble: 468 × 4
## Year Age Vaccinated_Death Unvaccinated_Death
## <chr> <chr> <int> <int>
## 1 2021 12-17 0 4
## 2 2021 18-29 0 25
## 3 2021 30-49 17 345
## 4 2021 50-64 97 740
## 5 2021 65-79 278 915
## 6 2021 80+ 350 527
## 7 2021 12-17 0 2
## 8 2021 18-29 2 30
## 9 2021 30-49 16 267
## 10 2021 50-64 96 669
## # ℹ 458 more rows
It could be helpful to store the difference in vaccinated deaths and unvaccinated so that we can reference one value instead of two.
deaths5 <- deaths4 %>%
mutate( , Difference = Unvaccinated_Death - Vaccinated_Death)
tibble(deaths5)
## # A tibble: 468 × 5
## Year Age Vaccinated_Death Unvaccinated_Death Difference
## <chr> <chr> <int> <int> <int>
## 1 2021 12-17 0 4 4
## 2 2021 18-29 0 25 25
## 3 2021 30-49 17 345 328
## 4 2021 50-64 97 740 643
## 5 2021 65-79 278 915 637
## 6 2021 80+ 350 527 177
## 7 2021 12-17 0 2 2
## 8 2021 18-29 2 30 28
## 9 2021 30-49 16 267 251
## 10 2021 50-64 96 669 573
## # ℹ 458 more rows
So now, the Difference column values represent how many more people that died from covid who were unvaccinated compared to the people who died from covid that were vaccinated.
deaths6 <- deaths5 %>%
group_by(Year, Age) %>%
summarize_all(sum)
tibble(deaths6)
## # A tibble: 18 × 5
## Year Age Vaccinated_Death Unvaccinated_Death Difference
## <chr> <chr> <int> <int> <int>
## 1 2021 12-17 2 25 23
## 2 2021 18-29 30 351 321
## 3 2021 30-49 312 3362 3050
## 4 2021 50-64 1656 8695 7039
## 5 2021 65-79 4550 11739 7189
## 6 2021 80+ 5502 7408 1906
## 7 2022 12-17 12 57 45
## 8 2022 18-29 119 373 254
## 9 2022 30-49 1023 2444 1421
## 10 2022 50-64 4664 7558 2894
## 11 2022 65-79 14078 15571 1493
## 12 2022 80+ 20163 15678 -4485
## 13 2023 12-17 0 5 5
## 14 2023 18-29 18 25 7
## 15 2023 30-49 92 150 58
## 16 2023 50-64 383 427 44
## 17 2023 65-79 1290 998 -292
## 18 2023 80+ 2080 1340 -740
I want to keep in mind that both Vaccinated_Death and Unvaccinated_Death are recorded as deaths per 100,000 cases.
ggplot(deaths6, aes(x = Age, y = Difference, fill = Year)) +
geom_col(position = "dodge")
I wasn’t expecting a graph like this! This leads me to believe that being unvaccinated as someone over 80 is advantageous over being vaccinated.
Overall, people age 12-79 that are vaccinated are dying less than people age 12-79 that are unvaccinated. In 2021, this conclusion is most relevant. In 2022, there is less of a difference between deaths of the vaccinated and deaths of the unvaccinated for people age 12-79. And in 2023, there is almost no difference. 2022 had the more 80+ year old vaccinated people die than 80+ year old unvaccinated people.