This is a demonstration of how to make a heat map of Indiana. I’ll use the choroplethr
package, along with some other packages that will make things easier:
library(readxl)
library(dplyr)
library(choroplethr)
library(choroplethrMaps)
library(ggplot2)
I’m going to start with data at the zip code level and then scale it up to county and larger regions. Here I grab data about Indiana zip codes and counties from www.stats.indiana.edu:
tmp = tempfile()
download.file(url = "http://www.stats.indiana.edu/maptools/county_zips.xls",
destfile = tmp)
countyZIP <- read_excel(tmp, sheet = 2, skip = 1)
names(countyZIP) <- c("FIPS", "county", "zip", "percent_zip_in_county")
# filter out a row of NAs at the end and drop the last column
countyZIP <- filter(countyZIP, !is.na(county))
countyZIP <- countyZIP[, -5]
# change class to data.frame
countyZIP <- as.data.frame(countyZIP)
# take a look see
head(countyZIP)
## Source: local data frame [6 x 4]
##
## FIPS county zip percent_zip_in_county
## (chr) (chr) (chr) (dbl)
## 1 001 Adams 46711 99.999994
## 2 001 Adams 46714 4.910616
## 3 001 Adams 46733 98.199316
## 4 001 Adams 46740 97.094041
## 5 001 Adams 46772 99.999990
## 6 001 Adams 46773 3.629616
The countyZIP
data frame has a record for every zip-county combination. If a zip code straddles two counties, then it will show up twice with the percentage of area in each county recorded in the percent_zip_in_county
column. For simplicity I will assign one zip code to one county, based on the highest percent of zip code area.
countyZIP <- group_by(countyZIP, zip)
countyZIP <- filter(countyZIP, percent_zip_in_county == max(percent_zip_in_county))
Now I’ll grab some income data from the IRS so I have something to plot.
# this takes a while to read...
income <- read.csv("https://www.irs.gov/pub/irs-soi/13zpallagi.csv")
income <- filter(income, agi_stub == 1)
income <- select(income, zipcode, N1)
The income
data frame now has the total number of tax returns that were below $25,000 for each zip code in 2013. I’ll merge this data with the countyZIP
data frame.
income$zipcode <- as.character(income$zipcode)
countyZIP <- left_join(countyZIP, income, by = c("zip" = "zipcode"))
Now I can plot low income data in Indiana. First I’ll plot by county. (If you would like to plot zip code areas, check out this vignette for the choroplethrZip
package). To do this I need to summarize the data at the county level.
county <- group_by(countyZIP, county)
county <- summarize(county, value = sum(N1, na.rm = TRUE))
Now I’ll plot a heat map of the low income data by county, using the choroplethr
package.
county$county <- tolower(county$county)
data(county.regions)
county.regions <- filter(county.regions, state.name == "indiana")
county <- left_join(county, county.regions, by = c("county" = "county.name"))
county_map <- county_choropleth(county, legend = "Returns under $25,000",
state_zoom = "indiana")
county_map
The choroplethr
map uses ggplot2
, so we can customize this map just like any other ggplot2
graph.
county_map + scale_fill_brewer(name = "Returns under $25,000", palette = "YlOrRd")
But let’s say I want to make a county map with the data summarized into larger regions. For this data, it would make sense to use the Indiana Economic Growth Regions. Amazingly, I can’t find a text table that has this information, so I made one myself.
growth_regions <- read.csv("https://gist.githubusercontent.com/NateByers/7553659b57a31ef82fea/raw/264c5776dde091f1d24b8710f8e983a4d717f024/economic_growth_regions", stringsAsFactors = FALSE)
Now I summarize the data by region and plot it.
regions <- left_join(county, growth_regions, by = "county")
regions <- rename(regions, county_value = value)
regions <- group_by(regions, growth_region)
regions <- mutate(regions, value = sum(county_value))
region_map <- county_choropleth(regions, state_zoom = "indiana") +
scale_fill_brewer(name = "Returns under $25,000", palette = "YlOrRd")
region_map