I installed a package called reshape2 to help visualize the filtered data better, It helped by making the format wide instead of long. The dcast fuction is what arranged the data.
Warning: package 'reshape2' was built under R version 4.3.3
Attaching package: 'reshape2'
The following object is masked from 'package:tidyr':
smiths
# Load the datasetdata("us_contagious_diseases")# Prepare the data for heatmap plotting# Let's say we want to plot counts of cases of Hepatitis A over the years and stateshepatitis_data <- us_contagious_diseases %>%filter(disease =="Hepatitis A") %>%select(year, state, count) %>%dcast(year ~ state, value.var ="count", fill =0)
Making a heatmap for hepatitis A cases over the years by state.
ggplot(melt(hepatitis_data, id.vars ="year"), aes(x = variable, y = year, fill = value)) +geom_tile(color ="lightblue") +scale_fill_gradient(low ="white", high ="red") +theme_clean() +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1)) +labs(x ="State", y ="Year", fill ="Hepatitis A Cases") +ggtitle("Hepatitis A Cases over the Years by State")
Conclusion
For this assignment, I used the US contagious diseases data set. The data set had a lot of variables. I chose to stick with one disease, and compare its effect over the fifty states. I filtered out hepatitis A and only used the three columns I needed, which were state count and year.