In this presentation, I am going to take a list of addresses in the United States and create two types of density maps. The first one is going to be a density map by zip code. The second one is going to be a density map by states.
library(zipcode)
library(maps)
library(viridis)
library(albersusa)
require(ggplot2)
library(ggthemes)
library(tidyverse)
library(DT)
library(knitr)
load billing address data
clean billing zip code
Attempts to detect and clean up suspected ZIP codes. Will strip “ZIP+4” suffixes to match format of zipcode data.frame. Restores leading zeros, converts invalid entries to NAs, and returns character vector. Note that this function does not attempt to find a matching ZIP code in the database, but rather examines formatting alone
#Load transaction billing addresses
transaction <- read.csv("./USA-BillingAddress.csv", header=TRUE)
#load zipcode data from zipcode package
data(zipcode)
#Clean zip codes in transaction dataframe
transaction$zip <- zipcode::clean.zipcodes(transaction$zip)
transaction
This contains 290,650 transactions.
zipcode
This is from the zipcode package. The zipcode
data frame contains longitude and latitude information based on zip code.
transaction
data for density mapping by zip codeThe code below will group transactions by zip codes and count the number of transactions for each zip code group. The count for each group will be used for the density mapping.
The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.
There are 7 different density levels of the 13,973 groups by zip codes.
#group transactions by zip code and get the count per group
group_by.zip <-
stats::aggregate(data.frame(count=transaction$id),by=list(zip=transaction$zip), length)
#min and max of count before transformation
c(min(group_by.zip$count),max(group_by.zip$count))
## [1] 1 572
#transform count
group_by.zip$count <- ceiling(log(group_by.zip$count*1.1))
#min and max of count after transformation
c(min(group_by.zip$count),max(group_by.zip$count))
## [1] 1 7
#number of groups by zip codes
nrow(group_by.zip)
## [1] 13973
#rename count column to density
colnames(group_by.zip)[2] <- c("density")
This is a preview of the transformed count of each zip code.
zip | density | |
---|---|---|
2 | 00650 | 1 |
3 | 00796 | 1 |
4 | 01001 | 4 |
5 | 01002 | 4 |
6 | 01005 | 3 |
7 | 01007 | 3 |
8 | 01011 | 1 |
9 | 01012 | 1 |
10 | 01013 | 4 |
11 | 01020 | 4 |
12 | 01021 | 2 |
transaction
data for density mapping by stateThe code below will group transactions by states and count the number of transactions for each group. The count for each group will be used for the density mapping.
The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.
#group transactions by state and get the count per group
group_by.state <- stats::aggregate(transaction$id,by=list(transaction$state),length)
#rename columns
colnames(group_by.state) <- c("iso_3166_2", "count")
#min and max of count before transformation
c(min(group_by.state$count),max(group_by.state$count))
## [1] 330 52253
#transform count
group_by.state$count <- ceiling(log(group_by.state$count*1.1))
#min and max of count after transformation
c(min(group_by.state$count),max(group_by.state$count))
## [1] 6 11
#number of groups by state
nrow(group_by.state)
## [1] 51
#rename count column to density
colnames(group_by.state)[2] <- c("density")
The code below takes the density by zip code data that was derived from the transaction
data frame and joins this with the zipcode
data frame that contains longtitude and latitude by zip code.
#join density by zip code data to zipcode data to get longitude and latitude
map.group_by.zip <- merge(group_by.zip, zipcode, by="zip")
map.group_by.zip
As you can see, each zip code group now has a longtitude and latitude data.
datatable(map.group_by.zip[1:10,])
The code below uses ggplot2::map_data
function, which easily turn data from the maps package in to a data frame suitable for plotting with ggplot2. The value "state"
that was passed to the function is a name of a map provided by the maps
package.
#United States boundaries map
usa <- ggplot2::map_data("state")
Preview of usa
map data
long | lat | group | order | region | subregion |
---|---|---|---|---|---|
-87.46201 | 30.38968 | 1 | 1 | alabama | NA |
-87.48493 | 30.37249 | 1 | 2 | alabama | NA |
-87.52503 | 30.37249 | 1 | 3 | alabama | NA |
-87.53076 | 30.33239 | 1 | 4 | alabama | NA |
-87.57087 | 30.32665 | 1 | 5 | alabama | NA |
-87.58806 | 30.32665 | 1 | 6 | alabama | NA |
ggplot
is used to plot the density points and to render the map of the United States. The geom_polygon
function is used to draw the map of the United States. The object usa
, which was generated by the function map_data("state")
is passed to the geom_polygon
function.
ggplot(map.group_by.zip,aes(longitude,latitude)) +
geom_polygon(data=usa,aes(x=long,y=lat,group=group),color='gray',fill="white",alpha=.70)+
geom_point(aes(color = density),size=.15,alpha=.25) +
scale_colour_gradient(low = "yellow", high = "black") +
xlim(-125,-65)+ylim(20,50)
## Warning: Removed 118 rows containing missing values (geom_point).
The albersusa
package is used to generate a shape file for the United States. The function usa_sf
retrieves a U.S. state composite map using the `“aeqd” projection parameter. This stands for Azimuthal Equidistant Projection.
USA_sf <- albersusa::usa_sf(proj="aeqd")
USA_sf
dataThe next step is to join group_by.state
with USA_sf
. We’re joining these two tables by the column name iso_3166_2
, which is the 2 letter abbreviation for the states. It’s important that all rows in USA_sf
are included in the output table map.group_by_state
because this table is going to be used to render the map of the United States.
map.group_by.state <- left_join(USA_sf, group_by.state, by="iso_3166_2")
ggplot
is used to map the density and render shape file. The viridis
color scale is used to render the color that represents the densities for each state.
ggplot(data=map.group_by.state, aes(fill = density, color = density)) +
geom_sf() +
scale_fill_viridis(option = "E",direction=-1) +
scale_color_viridis(option = "E",direction=-1) +
theme_map(base_size=11)
This work is based on tutorial provided by @AWHSTIN. https://austinwehrwein.com/digital-humanities/creating-a-density-map-in-r-with-zipcodes/