Data 607 Presentation

Density Maps

In this presentation, I am going to take a list of addresses in the United States and create two types of density maps. The first one is going to be a density map by zip code. The second one is going to be a density map by states.

Packages

maps - creating polygons for rending maps
albersusa - shape files for the rendering of maps
zipcode - cleaning zip codes and provide longitude and latitude per zip code
viridis - color scale density map
ggplot2 - render shape files and polygons and density data on the map

Links

Libraries

library(zipcode)
library(maps)
library(viridis)
library(albersusa)
require(ggplot2)
library(ggthemes)
library(tidyverse)
library(DT)
library(knitr)

Setup data

load billing address data
clean billing zip code

Attempts to detect and clean up suspected ZIP codes. Will strip “ZIP+4” suffixes to match format of zipcode data.frame. Restores leading zeros, converts invalid entries to NAs, and returns character vector. Note that this function does not attempt to find a matching ZIP code in the database, but rather examines formatting alone

#Load transaction billing addresses
transaction <- read.csv("./USA-BillingAddress.csv", header=TRUE)

#load zipcode data from zipcode package
data(zipcode)

#Clean zip codes in transaction dataframe
transaction$zip <- zipcode::clean.zipcodes(transaction$zip)

Preview of `transaction`

This contains 290,650 transactions.

Preview of `zipcode`

This is from the zipcode package. The zipcode data frame contains longitude and latitude information based on zip code.

Prepare `transaction` data for density mapping by zip code

The code below will group transactions by zip codes and count the number of transactions for each zip code group. The count for each group will be used for the density mapping.

The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.

There are 7 different density levels of the 13,973 groups by zip codes.

#group transactions by zip code and get the count per group
group_by.zip <- 
  stats::aggregate(data.frame(count=transaction$id),by=list(zip=transaction$zip), length)

#min and max of count before transformation
c(min(group_by.zip$count),max(group_by.zip$count))

## [1]   1 572

#transform count 
group_by.zip$count <- ceiling(log(group_by.zip$count*1.1))

#min and max of count after transformation
c(min(group_by.zip$count),max(group_by.zip$count))

## [1] 1 7

#number of groups by zip codes
nrow(group_by.zip)

## [1] 13973

#rename count column to density
colnames(group_by.zip)[2] <- c("density")

This is a preview of the transformed count of each zip code.

	zip	density
2	00650	1
3	00796	1
4	01001	4
5	01002	4
6	01005	3
7	01007	3
8	01011	1
9	01012	1
10	01013	4
11	01020	4
12	01021	2

Prepare `transaction` data for density mapping by state

The code below will group transactions by states and count the number of transactions for each group. The count for each group will be used for the density mapping.

The value of count is transformed so there are only a few density levels. Because a count of 1 is present, the count is first multiplied by 1.1 to avoid taking the log of 1 so that we don’t end up with count of zero. The ceiling of the log is then taken so that we only end up with whole numbers.

#group transactions by state and get the count per group
group_by.state <- stats::aggregate(transaction$id,by=list(transaction$state),length)

#rename columns
colnames(group_by.state) <- c("iso_3166_2", "count")

#min and max of count before transformation
c(min(group_by.state$count),max(group_by.state$count))

## [1]   330 52253

#transform count 
group_by.state$count <- ceiling(log(group_by.state$count*1.1))

#min and max of count after transformation
c(min(group_by.state$count),max(group_by.state$count))

## [1]  6 11

#number of groups by state
nrow(group_by.state)

## [1] 51

#rename count column to density
colnames(group_by.state)[2] <- c("density")

Create zip code density map on the United States

The code below takes the density by zip code data that was derived from the transaction data frame and joins this with the zipcode data frame that contains longtitude and latitude by zip code.

#join density by zip code data to zipcode data to get longitude and latitude
map.group_by.zip <- merge(group_by.zip, zipcode, by="zip")

Preview of `map.group_by.zip`

As you can see, each zip code group now has a longtitude and latitude data.

datatable(map.group_by.zip[1:10,])

United States Boundaries Map

The code below uses ggplot2::map_data function, which easily turn data from the maps package in to a data frame suitable for plotting with ggplot2. The value "state" that was passed to the function is a name of a map provided by the maps package.

#United States boundaries map
usa <- ggplot2::map_data("state")

Preview of usa map data

long	lat	group	order	region	subregion
-87.46201	30.38968	1	1	alabama	NA
-87.48493	30.37249	1	2	alabama	NA
-87.52503	30.37249	1	3	alabama	NA
-87.53076	30.33239	1	4	alabama	NA
-87.57087	30.32665	1	5	alabama	NA
-87.58806	30.32665	1	6	alabama	NA

Density map by zip code on the continental United States

ggplot is used to plot the density points and to render the map of the United States. The geom_polygon function is used to draw the map of the United States. The object usa, which was generated by the function map_data("state") is passed to the geom_polygon function.

ggplot(map.group_by.zip,aes(longitude,latitude)) +
  geom_polygon(data=usa,aes(x=long,y=lat,group=group),color='gray',fill="white",alpha=.70)+
  geom_point(aes(color = density),size=.15,alpha=.25) +
  scale_colour_gradient(low = "yellow", high = "black") + 
  xlim(-125,-65)+ylim(20,50)

## Warning: Removed 118 rows containing missing values (geom_point).

Create state density map on United States

The albersusa package is used to generate a shape file for the United States. The function usa_sf retrieves a U.S. state composite map using the `“aeqd” projection parameter. This stands for Azimuthal Equidistant Projection.

USA_sf <- albersusa::usa_sf(proj="aeqd")

Preview of `USA_sf` data

The next step is to join group_by.state with USA_sf. We’re joining these two tables by the column name iso_3166_2, which is the 2 letter abbreviation for the states. It’s important that all rows in USA_sf are included in the output table map.group_by_state because this table is going to be used to render the map of the United States.

map.group_by.state <- left_join(USA_sf, group_by.state, by="iso_3166_2")

State density map on United States

ggplot is used to map the density and render shape file. The viridis color scale is used to render the color that represents the densities for each state.

  ggplot(data=map.group_by.state, aes(fill = density, color = density)) + 
  geom_sf() + 
  scale_fill_viridis(option = "E",direction=-1) + 
  scale_color_viridis(option = "E",direction=-1) + 
  theme_map(base_size=11)

This work is based on tutorial provided by @AWHSTIN. https://austinwehrwein.com/digital-humanities/creating-a-density-map-in-r-with-zipcodes/

Data 607 Presentation

S. Tinapunan

April 17, 2018

Density Maps

Packages

Links

Libraries

Setup data

Preview of `transaction`

Preview of `zipcode`

Prepare `transaction` data for density mapping by zip code

Prepare `transaction` data for density mapping by state

Create zip code density map on the United States

Preview of `map.group_by.zip`

United States Boundaries Map

Density map by zip code on the continental United States

Create state density map on United States

Preview of `USA_sf` data

State density map on United States

Data 607 Presentation

S. Tinapunan

April 17, 2018

Density Maps

Packages

Links

Libraries

Setup data

Preview of transaction

Preview of zipcode

Prepare transaction data for density mapping by zip code

Prepare transaction data for density mapping by state

Create zip code density map on the United States

Preview of map.group_by.zip

United States Boundaries Map

Density map by zip code on the continental United States

Create state density map on United States

Preview of USA_sf data

State density map on United States

Preview of `transaction`

Preview of `zipcode`

Prepare `transaction` data for density mapping by zip code

Prepare `transaction` data for density mapping by state

Preview of `map.group_by.zip`

Preview of `USA_sf` data