Density Maps

In this presentation, I am going to take a list of addresses in the United States and create two types of density maps. The first one is going to be a density map by zip code. The second one is going to be a density map by states.


Packages


Libraries

library(zipcode)
library(maps)
library(viridis)
library(albersusa)
require(ggplot2)
library(ggthemes)
library(tidyverse)
library(DT)
library(knitr)

Setup data

Attempts to detect and clean up suspected ZIP codes. Will strip “ZIP+4” suffixes to match format of zipcode data.frame. Restores leading zeros, converts invalid entries to NAs, and returns character vector. Note that this function does not attempt to find a matching ZIP code in the database, but rather examines formatting alone

#Load transaction billing addresses
transaction <- read.csv("./USA-BillingAddress.csv", header=TRUE)

#load zipcode data from zipcode package
data(zipcode)

#Clean zip codes in transaction dataframe
transaction$zip <- zipcode::clean.zipcodes(transaction$zip)

Preview of transaction

This contains 290,650 transactions.


Preview of zipcode

This is from the zipcode package. The zipcode data frame contains longitude and latitude information based on zip code.


Prepare transaction data for density mapping by zip code

The code below will group transactions by zip codes and count the number of transactions for each zip code group. The count for each group will be used for the density mapping.

The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.

There are 7 different density levels of the 13,973 groups by zip codes.

#group transactions by zip code and get the count per group
group_by.zip <- 
  stats::aggregate(data.frame(count=transaction$id),by=list(zip=transaction$zip), length)

#min and max of count before transformation
c(min(group_by.zip$count),max(group_by.zip$count))
## [1]   1 572
#transform count 
group_by.zip$count <- ceiling(log(group_by.zip$count*1.1))

#min and max of count after transformation
c(min(group_by.zip$count),max(group_by.zip$count))
## [1] 1 7
#number of groups by zip codes
nrow(group_by.zip)
## [1] 13973
#rename count column to density
colnames(group_by.zip)[2] <- c("density")

This is a preview of the transformed count of each zip code.

zip density
2 00650 1
3 00796 1
4 01001 4
5 01002 4
6 01005 3
7 01007 3
8 01011 1
9 01012 1
10 01013 4
11 01020 4
12 01021 2

Prepare transaction data for density mapping by state

The code below will group transactions by states and count the number of transactions for each group. The count for each group will be used for the density mapping.

The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.

#group transactions by state and get the count per group
group_by.state <- stats::aggregate(transaction$id,by=list(transaction$state),length)

#rename columns
colnames(group_by.state) <- c("iso_3166_2", "count")

#min and max of count before transformation
c(min(group_by.state$count),max(group_by.state$count))
## [1]   330 52253
#transform count 
group_by.state$count <- ceiling(log(group_by.state$count*1.1))

#min and max of count after transformation
c(min(group_by.state$count),max(group_by.state$count))
## [1]  6 11
#number of groups by state
nrow(group_by.state)
## [1] 51
#rename count column to density
colnames(group_by.state)[2] <- c("density")


Create zip code density map on the United States

The code below takes the density by zip code data that was derived from the transaction data frame and joins this with the zipcode data frame that contains longtitude and latitude by zip code.

#join density by zip code data to zipcode data to get longitude and latitude
map.group_by.zip <- merge(group_by.zip, zipcode, by="zip")

Preview of map.group_by.zip

As you can see, each zip code group now has a longtitude and latitude data.

datatable(map.group_by.zip[1:10,])

United States Boundaries Map

The code below uses ggplot2::map_data function, which easily turn data from the maps package in to a data frame suitable for plotting with ggplot2. The value "state" that was passed to the function is a name of a map provided by the maps package.

#United States boundaries map
usa <- ggplot2::map_data("state") 

Preview of usa map data

long lat group order region subregion
-87.46201 30.38968 1 1 alabama NA
-87.48493 30.37249 1 2 alabama NA
-87.52503 30.37249 1 3 alabama NA
-87.53076 30.33239 1 4 alabama NA
-87.57087 30.32665 1 5 alabama NA
-87.58806 30.32665 1 6 alabama NA

Density map by zip code on the continental United States

ggplot is used to plot the density points and to render the map of the United States. The geom_polygon function is used to draw the map of the United States. The object usa, which was generated by the function map_data("state") is passed to the geom_polygon function.

ggplot(map.group_by.zip,aes(longitude,latitude)) +
  geom_polygon(data=usa,aes(x=long,y=lat,group=group),color='gray',fill="white",alpha=.70)+
  geom_point(aes(color = density),size=.15,alpha=.25) +
  scale_colour_gradient(low = "yellow", high = "black") + 
  xlim(-125,-65)+ylim(20,50)
## Warning: Removed 118 rows containing missing values (geom_point).


Create state density map on United States

The albersusa package is used to generate a shape file for the United States. The function usa_sf retrieves a U.S. state composite map using the `“aeqd” projection parameter. This stands for Azimuthal Equidistant Projection.

USA_sf <- albersusa::usa_sf(proj="aeqd")

Preview of USA_sf data

The next step is to join group_by.state with USA_sf. We’re joining these two tables by the column name iso_3166_2, which is the 2 letter abbreviation for the states. It’s important that all rows in USA_sf are included in the output table map.group_by_state because this table is going to be used to render the map of the United States.

map.group_by.state <- left_join(USA_sf, group_by.state, by="iso_3166_2")

State density map on United States

ggplot is used to map the density and render shape file. The viridis color scale is used to render the color that represents the densities for each state.

  ggplot(data=map.group_by.state, aes(fill = density, color = density)) + 
  geom_sf() + 
  scale_fill_viridis(option = "E",direction=-1) + 
  scale_color_viridis(option = "E",direction=-1) + 
  theme_map(base_size=11)


This work is based on tutorial provided by @AWHSTIN. https://austinwehrwein.com/digital-humanities/creating-a-density-map-in-r-with-zipcodes/