Load Required Packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggthemes)
library(ggrepel)

DS Labs Datasets

Use the package DSLabs (Data Science Labs)

There are a number of datasets in this package to use to practice creating visualizations

install.packages("dslabs")  # these are data science labs
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"

Contagious disease data for US states

The next dataset contains yearly counts for Hepatitis A, measles, mumps, pertussis, polio, rubella, and smallpox for US states. Original data courtesy of Tycho Project.

Focus on polio

  1. filter out Alaska and Hawaii

  2. mutate the rate of polio by taking the count/(population10,00052)/weeks_reporting

  3. draw a vertical line for 1955, which is when the first polio vaccination was availabe in the United States

library(RColorBrewer)
data("us_contagious_diseases")
the_disease <- "Polio"
us_contagious_diseases %>%
  filter(!state%in%c("Hawaii","Alaska") & disease ==  the_disease) %>%
  mutate(rate = count / population * 10000 * 52 / weeks_reporting) %>%
  mutate(state = reorder(state, rate)) %>%
  ggplot(aes(year, state,  fill = rate)) +
  geom_tile(color = "grey50") +
  scale_x_continuous(expand=c(0,0)) +
  scale_fill_gradientn(colors = brewer.pal(9, "RdPu"), trans = "sqrt") +
  geom_vline(xintercept=1955, col = "blue") +
  theme_minimal() +  theme(panel.grid = element_blank()) +
  ggtitle("Yearly Counts for Polio in the United States") +
  ylab("State") +
  xlab("Year")

DS Labs - United States Polio Heatmap Review

I used the “us_contagious_diseases” data set from the DS Labs datasets. This dataset contains yearly counts for Hepatitis A, measles, mumps, pertussis, polio, rubella, and smallpox for US states. I created a heat map and decided to focus on the polio disease. Vaccinations became available for the polio vaccine in the United States in 1955. As a result, you will see a “blue” line representing this year in the heatmap. The heatmap shows a rise in the rate of polio across the United States from about 1949 - 1956. Prior to that States had isolated spikes in cases with no consistent pattern. After 1956, there is a significant decline in polio cases across all States.