Overview

Looking at the efficacy of COVID-19 vaccines across multiple demographics, time, and vaccination status (none, 1 vaccine, 2 vaccines) measured in deaths per 100,000 cases.

covid <- read.csv("https://raw.githubusercontent.com/evelynbartley/Data-607/main/Rates_of_COVID-19_Cases_or_Deaths_by_Age_Group_and_Updated__Bivalent__Booster_Status_20240302.csv")
tibble(covid)
## # A tibble: 2,654 × 20
##    outcome month   mmwr_week age_group vaccination_status vaccinated_with_outc…¹
##    <chr>   <chr>       <int> <chr>     <chr>                               <int>
##  1 case    OCT 20…    202140 12-17     vaccinated                           2332
##  2 case    OCT 20…    202140 12-17     vax with updated …                     NA
##  3 case    OCT 20…    202140 18-29     vaccinated                           9571
##  4 case    OCT 20…    202140 18-29     vax with updated …                     NA
##  5 case    OCT 20…    202140 30-49     vaccinated                          25229
##  6 case    OCT 20…    202140 30-49     vax with updated …                     NA
##  7 case    OCT 20…    202140 50-64     vaccinated                          19123
##  8 case    OCT 20…    202140 50-64     vax with updated …                     NA
##  9 case    OCT 20…    202140 65-79     vaccinated                          14184
## 10 case    OCT 20…    202140 65-79     vax with updated …                     NA
## # ℹ 2,644 more rows
## # ℹ abbreviated name: ¹​vaccinated_with_outcome
## # ℹ 14 more variables: vaccinated_population <dbl>,
## #   unvaccinated_with_outcome <int>, unvaccinated_population <dbl>,
## #   crude_vax_ir <dbl>, crude_unvax_ir <dbl>, crude_irr <dbl>,
## #   age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>, age_adj_irr <dbl>,
## #   monthly_age_adj_vax_ir <dbl>, monthly_age_adj_unvax_ir <dbl>, …

So this dataset is pretty overwhelming. To look at efficacy of Covid vaccines, I want to look at covid deaths for each age range.

I only want to look at the cases that resulted in death.

deaths <- covid %>%
  filter(outcome == "death")
tibble(deaths)
## # A tibble: 1,300 × 20
##    outcome month   mmwr_week age_group vaccination_status vaccinated_with_outc…¹
##    <chr>   <chr>       <int> <chr>     <chr>                               <int>
##  1 death   OCT 20…    202140 12-17     vaccinated                              0
##  2 death   OCT 20…    202140 12-17     vax with updated …                     NA
##  3 death   OCT 20…    202140 18-29     vaccinated                              0
##  4 death   OCT 20…    202140 18-29     vax with updated …                     NA
##  5 death   OCT 20…    202140 30-49     vaccinated                             17
##  6 death   OCT 20…    202140 30-49     vax with updated …                     NA
##  7 death   OCT 20…    202140 50-64     vaccinated                             97
##  8 death   OCT 20…    202140 50-64     vax with updated …                     NA
##  9 death   OCT 20…    202140 65-79     vaccinated                            278
## 10 death   OCT 20…    202140 65-79     vax with updated …                     NA
## # ℹ 1,290 more rows
## # ℹ abbreviated name: ¹​vaccinated_with_outcome
## # ℹ 14 more variables: vaccinated_population <dbl>,
## #   unvaccinated_with_outcome <int>, unvaccinated_population <dbl>,
## #   crude_vax_ir <dbl>, crude_unvax_ir <dbl>, crude_irr <dbl>,
## #   age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>, age_adj_irr <dbl>,
## #   monthly_age_adj_vax_ir <dbl>, monthly_age_adj_unvax_ir <dbl>, …
#separate the month column so that each column only has one variable
deaths1 <- deaths %>%
  separate(month, into = c("Month", "Year"))
tibble(deaths1)
## # A tibble: 1,300 × 21
##    outcome Month Year  mmwr_week age_group vaccination_status      
##    <chr>   <chr> <chr>     <int> <chr>     <chr>                   
##  1 death   OCT   2021     202140 12-17     vaccinated              
##  2 death   OCT   2021     202140 12-17     vax with updated booster
##  3 death   OCT   2021     202140 18-29     vaccinated              
##  4 death   OCT   2021     202140 18-29     vax with updated booster
##  5 death   OCT   2021     202140 30-49     vaccinated              
##  6 death   OCT   2021     202140 30-49     vax with updated booster
##  7 death   OCT   2021     202140 50-64     vaccinated              
##  8 death   OCT   2021     202140 50-64     vax with updated booster
##  9 death   OCT   2021     202140 65-79     vaccinated              
## 10 death   OCT   2021     202140 65-79     vax with updated booster
## # ℹ 1,290 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## #   vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## #   unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## #   crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## #   age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## #   monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …

I only want to focus on cases where the person was vaccinated with at least a primary series of vaccines.

#select cases that have a vaccinated status
deaths2 <- deaths1 %>%
  filter(vaccination_status == "vaccinated")
tibble(deaths2)
## # A tibble: 650 × 21
##    outcome Month Year  mmwr_week age_group vaccination_status
##    <chr>   <chr> <chr>     <int> <chr>     <chr>             
##  1 death   OCT   2021     202140 12-17     vaccinated        
##  2 death   OCT   2021     202140 18-29     vaccinated        
##  3 death   OCT   2021     202140 30-49     vaccinated        
##  4 death   OCT   2021     202140 50-64     vaccinated        
##  5 death   OCT   2021     202140 65-79     vaccinated        
##  6 death   OCT   2021     202140 80+       vaccinated        
##  7 death   OCT   2021     202140 all_ages  vaccinated        
##  8 death   OCT   2021     202141 12-17     vaccinated        
##  9 death   OCT   2021     202141 18-29     vaccinated        
## 10 death   OCT   2021     202141 30-49     vaccinated        
## # ℹ 640 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## #   vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## #   unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## #   crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## #   age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## #   monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …
# I don't need the summary row "all_ages" so I filter the dataframe to include everything BUT that. I also don't have data for the 0.5-4 and 5-11 age groups for each year so I will not include that in my analysis.
deaths3 <- deaths2 %>%
  filter(!age_group == "all_ages") %>%
  filter(!age_group == "0.5-4") %>%
  filter(!age_group == "5-11")
tibble(deaths3)
## # A tibble: 468 × 21
##    outcome Month Year  mmwr_week age_group vaccination_status
##    <chr>   <chr> <chr>     <int> <chr>     <chr>             
##  1 death   OCT   2021     202140 12-17     vaccinated        
##  2 death   OCT   2021     202140 18-29     vaccinated        
##  3 death   OCT   2021     202140 30-49     vaccinated        
##  4 death   OCT   2021     202140 50-64     vaccinated        
##  5 death   OCT   2021     202140 65-79     vaccinated        
##  6 death   OCT   2021     202140 80+       vaccinated        
##  7 death   OCT   2021     202141 12-17     vaccinated        
##  8 death   OCT   2021     202141 18-29     vaccinated        
##  9 death   OCT   2021     202141 30-49     vaccinated        
## 10 death   OCT   2021     202141 50-64     vaccinated        
## # ℹ 458 more rows
## # ℹ 15 more variables: vaccinated_with_outcome <int>,
## #   vaccinated_population <dbl>, unvaccinated_with_outcome <int>,
## #   unvaccinated_population <dbl>, crude_vax_ir <dbl>, crude_unvax_ir <dbl>,
## #   crude_irr <dbl>, age_adj_vax_ir <dbl>, age_adj_unvax_ir <dbl>,
## #   age_adj_irr <dbl>, monthly_age_adj_vax_ir <dbl>,
## #   monthly_age_adj_unvax_ir <dbl>, monthly_age_adj_irr <dbl>, …

Now that I know all of the data I have is a “death” outcome, I can get rid of that column, because it is the same value for all cases. I want to construct my analysis based on . I also want to get rid of some of the other irrelevant columns. I want to focus on the columns “vaccinated_with_outcome” and “unvaccinated_with_outcome” for analysis, and change the names of those columns.

# subset the columns I want to keep
deaths4 <- deaths3 %>%
  select(Year, Age = age_group, Vaccinated_Death = vaccinated_with_outcome, Unvaccinated_Death = unvaccinated_with_outcome)
tibble(deaths4)
## # A tibble: 468 × 4
##    Year  Age   Vaccinated_Death Unvaccinated_Death
##    <chr> <chr>            <int>              <int>
##  1 2021  12-17                0                  4
##  2 2021  18-29                0                 25
##  3 2021  30-49               17                345
##  4 2021  50-64               97                740
##  5 2021  65-79              278                915
##  6 2021  80+                350                527
##  7 2021  12-17                0                  2
##  8 2021  18-29                2                 30
##  9 2021  30-49               16                267
## 10 2021  50-64               96                669
## # ℹ 458 more rows

It could be helpful to store the difference in vaccinated deaths and unvaccinated so that we can reference one value instead of two.

deaths5 <- deaths4 %>% 
  mutate( , Difference = Unvaccinated_Death - Vaccinated_Death)
tibble(deaths5)
## # A tibble: 468 × 5
##    Year  Age   Vaccinated_Death Unvaccinated_Death Difference
##    <chr> <chr>            <int>              <int>      <int>
##  1 2021  12-17                0                  4          4
##  2 2021  18-29                0                 25         25
##  3 2021  30-49               17                345        328
##  4 2021  50-64               97                740        643
##  5 2021  65-79              278                915        637
##  6 2021  80+                350                527        177
##  7 2021  12-17                0                  2          2
##  8 2021  18-29                2                 30         28
##  9 2021  30-49               16                267        251
## 10 2021  50-64               96                669        573
## # ℹ 458 more rows

So now, the Difference column values represent how many more people that died from covid who were unvaccinated compared to the people who died from covid that were vaccinated.

deaths6 <- deaths5 %>%
  group_by(Year, Age) %>%
  summarize_all(sum)

tibble(deaths6)
## # A tibble: 18 × 5
##    Year  Age   Vaccinated_Death Unvaccinated_Death Difference
##    <chr> <chr>            <int>              <int>      <int>
##  1 2021  12-17                2                 25         23
##  2 2021  18-29               30                351        321
##  3 2021  30-49              312               3362       3050
##  4 2021  50-64             1656               8695       7039
##  5 2021  65-79             4550              11739       7189
##  6 2021  80+               5502               7408       1906
##  7 2022  12-17               12                 57         45
##  8 2022  18-29              119                373        254
##  9 2022  30-49             1023               2444       1421
## 10 2022  50-64             4664               7558       2894
## 11 2022  65-79            14078              15571       1493
## 12 2022  80+              20163              15678      -4485
## 13 2023  12-17                0                  5          5
## 14 2023  18-29               18                 25          7
## 15 2023  30-49               92                150         58
## 16 2023  50-64              383                427         44
## 17 2023  65-79             1290                998       -292
## 18 2023  80+               2080               1340       -740

Analysis

I want to keep in mind that both Vaccinated_Death and Unvaccinated_Death are recorded as deaths per 100,000 cases.

ggplot(deaths6, aes(x = Age, y = Difference, fill = Year)) +
  geom_col(position = "dodge")

I wasn’t expecting a graph like this! This leads me to believe that being unvaccinated as someone over 80 is advantageous over being vaccinated.

Conclusion

Overall, people age 12-79 that are vaccinated are dying less than people age 12-79 that are unvaccinated. In 2021, this conclusion is most relevant. In 2022, there is less of a difference between deaths of the vaccinated and deaths of the unvaccinated for people age 12-79. And in 2023, there is almost no difference. 2022 had the more 80+ year old vaccinated people die than 80+ year old unvaccinated people.