getwd()
## [1] "C:/Users/Don A/Documents/Don's files/MC"
# install.packages("dslabs")  # these are data science labs
library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"

Contagious disease data for US states

The next dataset contains yearly counts for Hepatitis A, measles, mumps, pertussis, polio, rubella, and smallpox for US states. Original data courtesy of Tycho Project. Use it to show ways one can plot more than 2 dimensions.

I am going to do polio, not measles, for the heatmap

  1. keep the filter out of Alaska and Hawaii, as they have no data before they became states

  2. mutate the rate of polio by taking the count/(population x 10,000 x 52)/weeks_reporting

  3. draw a vertical line for 1955, which is when the first polio vaccination was developed by Jonas Salk. The oral vaccine was developed by Albert Sabin in 1961.

library(RColorBrewer)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggthemes)
library(ggrepel)
setwd("~/Don's files/MC")
data("us_contagious_diseases")
write_csv(us_contagious_diseases,"contagious.csv", na="")

Minor changes made to what was in the tutorial:

the disease, added axis labels, changed the fill-in color from “reds” to “YlOrRd” because after trial and error, I could tell those colors apart better, the vertical line date and color, and the theme – though I couldn’t tell the difference after I changed it.

the_disease <- "Polio"
us_contagious_diseases %>%
  filter(!state%in%c("Hawaii","Alaska") & disease ==  the_disease) %>%
  mutate(rate = count / population * 10000 * 52 / weeks_reporting) %>%
  mutate(state = reorder(state, rate)) %>%
  ggplot(aes(year, state,  fill = rate)) +
  geom_tile(color = "grey50") +
  scale_x_continuous(expand=c(0,0)) +
  scale_fill_gradientn(colors = brewer.pal(9, "YlOrRd"), trans = "sqrt") +
  geom_vline(xintercept=1955, col = "black") +
  theme_dark() +
  ggtitle("Reported Cases of Polio") +
  ylab("States") +
  xlab("Rate per 10,000 Population")

So, there are a lot of zeros (gray) in the heatmap. I looked at the entire csv dataset and determined that the apparently missing years are indeed in the dataset, with population data, but with zero counts for polio (and pertussis) for a 17 year period.

I surfed for a bit and found heatmaps with that included polio counts for those years – much reduced from prior years, as people used the vaccine, but definitely not zero – at least at first. It appears that the data was pulled directly from the Tycho Project website, which possibly had been amended since this version was downloaded.

This keeps failing. Here’s another attempt. I had it working in a separate markdown file, which vanished.

library(RColorBrewer)
library(tidyverse)
library(ggthemes)
library(ggrepel)
setwd("C:/Users/Don A/Documents/Don's files/MC")
poliodata <- read_csv('tychopolio.csv')
## Parsed with column specification:
## cols(
##   state = col_character(),
##   year = col_double(),
##   disease = col_character(),
##   rate = col_double()
## )
poliodata %>%
  ggplot(aes(year, state,  fill = rate)) +
  geom_tile(color = "grey50") +
  scale_x_continuous(expand=c(0,0)) +
  scale_fill_gradientn(colors = brewer.pal(9, "YlOrRd"), trans = "sqrt") +
  geom_vline(xintercept=1955, col = "black") +
  theme_dark() +
  ggtitle("Reported Cases of Polio, Direct from Tycho") +
  ylab("States") +
  xlab("Rate per 10,000 Population")