Indiana Heat Map in R

This is a demonstration of how to make a heat map of Indiana. I’ll use the choroplethr package, along with some other packages that will make things easier:

library(readxl)
library(dplyr)
library(choroplethr)
library(choroplethrMaps)
library(ggplot2)

I’m going to start with data at the zip code level and then scale it up to county and larger regions. Here I grab data about Indiana zip codes and counties from www.stats.indiana.edu:

tmp = tempfile()
download.file(url = "http://www.stats.indiana.edu/maptools/county_zips.xls", 
              destfile = tmp)
countyZIP <- read_excel(tmp, sheet = 2, skip = 1)
names(countyZIP) <- c("FIPS", "county", "zip", "percent_zip_in_county")

# filter out a row of NAs at the end and drop the last column
countyZIP <- filter(countyZIP, !is.na(county))
countyZIP <- countyZIP[, -5]

# change class to data.frame
countyZIP <- as.data.frame(countyZIP)

# take a look see
head(countyZIP)

## Source: local data frame [6 x 4]
## 
##    FIPS county   zip percent_zip_in_county
##   (chr)  (chr) (chr)                 (dbl)
## 1   001  Adams 46711             99.999994
## 2   001  Adams 46714              4.910616
## 3   001  Adams 46733             98.199316
## 4   001  Adams 46740             97.094041
## 5   001  Adams 46772             99.999990
## 6   001  Adams 46773              3.629616

The countyZIP data frame has a record for every zip-county combination. If a zip code straddles two counties, then it will show up twice with the percentage of area in each county recorded in the percent_zip_in_county column. For simplicity I will assign one zip code to one county, based on the highest percent of zip code area.

countyZIP <- group_by(countyZIP, zip)
countyZIP <- filter(countyZIP, percent_zip_in_county == max(percent_zip_in_county))

Now I’ll grab some income data from the IRS so I have something to plot.

# this takes a while to read...
income <- read.csv("https://www.irs.gov/pub/irs-soi/13zpallagi.csv")
income <- filter(income, agi_stub == 1)
income <- select(income, zipcode, N1)

The income data frame now has the total number of tax returns that were below $25,000 for each zip code in 2013. I’ll merge this data with the countyZIP data frame.

income$zipcode <- as.character(income$zipcode)
countyZIP <- left_join(countyZIP, income, by = c("zip" = "zipcode"))

Now I can plot low income data in Indiana. First I’ll plot by county. (If you would like to plot zip code areas, check out this vignette for the choroplethrZip package). To do this I need to summarize the data at the county level.

county <- group_by(countyZIP, county)
county <- summarize(county, value = sum(N1, na.rm = TRUE))

Now I’ll plot a heat map of the low income data by county, using the choroplethr package.

county$county <- tolower(county$county)
data(county.regions)
county.regions <- filter(county.regions, state.name == "indiana")
county <- left_join(county, county.regions, by = c("county" = "county.name"))
county_map <- county_choropleth(county, legend = "Returns under $25,000", 
                                state_zoom = "indiana")
county_map

The choroplethr map uses ggplot2, so we can customize this map just like any other ggplot2 graph.

county_map + scale_fill_brewer(name = "Returns under $25,000", palette = "YlOrRd")

But let’s say I want to make a county map with the data summarized into larger regions. For this data, it would make sense to use the Indiana Economic Growth Regions. Amazingly, I can’t find a text table that has this information, so I made one myself.

growth_regions <- read.csv("https://gist.githubusercontent.com/NateByers/7553659b57a31ef82fea/raw/264c5776dde091f1d24b8710f8e983a4d717f024/economic_growth_regions", stringsAsFactors = FALSE)

Now I summarize the data by region and plot it.

regions <- left_join(county, growth_regions, by = "county") 
regions <- rename(regions, county_value = value)
regions <- group_by(regions, growth_region)
regions <- mutate(regions, value = sum(county_value))
region_map <- county_choropleth(regions, state_zoom = "indiana") +
  scale_fill_brewer(name = "Returns under $25,000", palette = "YlOrRd")
region_map

Indiana Heat Map in R

Nathan Byers

January 5, 2016