I used the us_contagious_diseases dataset from the dslabs package. It contains official public health reports for several contagious diseases in the United States spanning several decades. My research focuses specifically on Rubella, incorporating data from all 50 states and the District of Columbia
library(RColorBrewer)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("dslabs")
Warning: package 'dslabs' was built under R version 4.5.3
[1] Hepatitis A Measles Mumps Pertussis Polio Rubella
[7] Smallpox
Levels: Hepatitis A Measles Mumps Pertussis Polio Rubella Smallpox
unique(us_contagious_diseases$state)
[1] Alabama Alaska Arizona
[4] Arkansas California Colorado
[7] Connecticut Delaware District Of Columbia
[10] Florida Georgia Hawaii
[13] Idaho Illinois Indiana
[16] Iowa Kansas Kentucky
[19] Louisiana Maine Maryland
[22] Massachusetts Michigan Minnesota
[25] Mississippi Missouri Montana
[28] Nebraska Nevada New Hampshire
[31] New Jersey New Mexico New York
[34] North Carolina North Dakota Ohio
[37] Oklahoma Oregon Pennsylvania
[40] Rhode Island South Carolina South Dakota
[43] Tennessee Texas Utah
[46] Vermont Virginia Washington
[49] West Virginia Wisconsin Wyoming
51 Levels: Alabama Alaska Arizona Arkansas California Colorado ... Wyoming
Data Wrangling
exclusion of other diseases use only rubella
filter out Alaska and Hawaii
create 4 categories : by region
mutate the rate of measles by taking the count/(population10,00052)/weeks_reporting
draw a vertical line for 1969, which is when the rubella vaccination was developed
disease1 <- us_contagious_diseases |>filter(disease =="Rubella"&!state %in%c("Hawaii", "Alaska")) |>mutate(region =case_when( state %in%c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont", "New Jersey", "New York", "Pennsylvania") ~"Northeast", state %in%c("Illinois", "Indiana", "Iowa", "Kansas", "Michigan", "Minnesota", "Missouri", "Nebraska", "North Dakota", "Ohio", "South Dakota", "Wisconsin") ~"Midwest", state %in%c("Alabama", "Arkansas", "Delaware", "District Of Columbia", "Florida", "Georgia", "Kentucky", "Louisiana", "Maryland", "Mississippi", "North Carolina", "Oklahoma", "South Carolina", "Tennessee", "Texas", "Virginia", "West Virginia") ~"South",TRUE~"West")) |>mutate(rate = count / population *10000/(weeks_reporting/52))head(disease1)
disease state year weeks_reporting count population region rate
1 Rubella Alabama 1966 31 112 3345787 South 0.5615150
2 Rubella Alabama 1967 27 214 3364130 South 1.2251255
3 Rubella Alabama 1968 33 404 3386068 South 1.8800746
4 Rubella Alabama 1969 36 136 3412450 South 0.5756698
5 Rubella Alabama 1970 51 380 3444165 South 1.1249490
6 Rubella Alabama 1971 51 226 3481798 South 0.6618172
Heatmap rubella regional visualization in the usa
library(RColorBrewer)ggplot(disease1, aes(x = year, y = region, fill = rate)) +geom_tile(color ="black") +scale_x_continuous(expand =c(0,0)) +scale_fill_gradientn(colors =brewer.pal(9, "Reds"), trans ="sqrt") +geom_vline(xintercept =1969, col ="skyblue", size =1.5) +theme_minimal() +labs(title ="Regional Rates of Rubella in the US",caption ="Source: Tycho Project",x ="", y ="")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.