In this lesson students will learn how to
Dutch Bros Coffee is a drive-through coffee chain headquartered in Grants Pass, Oregon, with company-owned and franchise locations throughout the United States.
library(tidyverse)
These data represent a list of store locations. There are variables for city
, state
, and zipcode
.
dutch<- read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/dutch_bros_locations.csv",
header=TRUE)
We will also want to wrangle our data so that we count the number of locations in a given state.
dutchSt<-dutch%>%
group_by(state)%>%
summarise(n=n())
head(dutchSt)
## # A tibble: 6 × 2
## state n
## <fct> <int>
## 1 AZ 61
## 2 CA 96
## 3 CO 30
## 4 ID 35
## 5 NM 7
## 6 NV 25
usmap
Package#install.packages("usmap")
library(usmap)
## Warning: package 'usmap' was built under R version 3.6.2
states <- usmap::us_map()
## What variables in the state object?
head(states)
## x y order hole piece group fips abbr full
## 1 1091779 -1380695 1 FALSE 1 01.1 01 AL Alabama
## 2 1091268 -1376372 2 FALSE 1 01.1 01 AL Alabama
## 3 1091140 -1362998 3 FALSE 1 01.1 01 AL Alabama
## 4 1090940 -1343517 4 FALSE 1 01.1 01 AL Alabama
## 5 1090913 -1341006 5 FALSE 1 01.1 01 AL Alabama
## 6 1090796 -1334480 6 FALSE 1 01.1 01 AL Alabama
Let’s investigate the data for Oregon.
oregon<-states%>%
filter(full=="Oregon")
ggplot(oregon, aes(x, y))+
geom_point()
These data allow us to play “connect the dots” to draw the shape of the state of Oregon.
Oh no, what happened?
ggplot(oregon, aes(x, y))+
geom_line()
We need to tell R what order to connect the dots.
geom_path()
connects the observations in the order in which they appear in the data.
geom_line()
connects them in order of the variable on the x axis.
ggplot(oregon, aes(x, y, group=group))+
geom_path()
We can actually think of geographies as generalized polygons!
ggplot(oregon, aes(x, y, group=group))+
geom_polygon(fill="forestgreen")
When joining the data to the map we need to have the same variable name in both. Let’s create a new column named state
in our states
data, that contains the state abbreviations, since the dutch
data set uses state abbreviations.
mapDutch<-states%>%
mutate(state=abbr)%>%
left_join(dutchSt)
## Joining, by = "state"
mapDutch%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = n),color="black")+
theme_bw()+
coord_fixed()
Viridis is a colorblind friendly color palette that can be used to create accessible heatmaps.
#install.packages("viridis")
library(viridis)
mapDutch%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = n),color="black")+
theme_bw()+
coord_fixed()+
scale_fill_viridis(option="viridis", direction = 1)+
ggtitle("Oregon - Home of Dutch Bros")
Using the package plotly
we can add some basic interactivity, like hover overs, zooms, and box selects.
#install.packages("plotly")
library(plotly)
p<-mapDutch%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = n),color="black")+
theme_bw()+
coord_fixed()+
scale_fill_viridis(option="viridis", direction = 1)
ggplotly(p)
starbuck <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/starbucks_directory.csv")
## Rows: 25600 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): Brand, Store Number, Store Name, Ownership Type, Street Address, C...
## dbl (2): Longitude, Latitude
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#str(starbuck)
Let’s look at the format of the column Country
:
head(unique(starbuck$Country))
## [1] "AD" "AE" "AR" "AT" "AU" "AW"
length(unique(starbuck$Country))
## [1] 73
map_data
in ggplot2
# Load map data
world_map = map_data("world")
world_map%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = world_map$long, y = world_map$lat)
Since the globe is a 3D object, we must project it onto a 2D surface to be able to visualize it on paper; however, this affects the presentation of land masses. There are five properties that can be distorted in projections: shape, area, angles, distance, and direction. Map makers are able to preserve two of these properties, but the other three are sacrificed.
The Mercator projection is one of the most common projections. It has equally spaced straight meridians, conformal, straight compass courses. However, this projection is known to distort areas. Thus, the further from the equator the larger a land mass may appear.
For instance, lets look at Greenland and Algeria:
world_map%>%
filter(region%in%c("Greenland", "Algeria"))%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = c(-100, 100), y = c(0, 100))+
coord_map("mercator")
TRUTH: Algeria is 2.4 million \(km^2\) and Greenland is 2.2 million \(km^2\)
Another commonly used project is the Mollweide, which is considered to be homalographic (equal-area) where the hemisphere is a circle.
world_map%>%
filter(region%in%c("Greenland", "Algeria"))%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = c(-100, 100), y = c(0, 100))+
coord_map("mollweide")
The longitude ranges from -180 to 180. You will notice that lines get strange when we plot the whole world. This is because there are parts of the US and Russia that extend past the 180th meridian.
world_map%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = world_map$long, y = world_map$lat)+
coord_map("mollweide")
We can fix this by taking out longitudes greater than 180.
world_map%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = world_map$long, y = world_map$lat)+
coord_map("mollweide", xlim=c(-180,180))
Read about different types of projections:
help(mapproject)
The following are the different types of projections available in R. Some of these projections require additional parameter(s) to be specified.
## PROJECTIONS
projlist <- c("aitoff", "albers", "azequalarea", "azequidist", "bicentric",
"bonne", "conic", "cylequalarea", "cylindrical", "eisenlohr", "elliptic",
"fisheye", "gall", "gilbert", "guyou", "harrison", "hex", "homing",
"lagrange", "lambert", "laue", "lune", "mercator", "mollweide", "newyorker",
"orthographic", "perspective", "polyconic", "rectangular", "simpleconic",
"sinusoidal", "tetra", "trapezoidal")
Task: Try three different projections and make observations for how the map of the Earth changes.
world_map%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = world_map$long, y = world_map$lat)+
coord_map("mercator", xlim=c(-180,180))
Let’s look at the map data. Note that region is used to specify country, but it has the whole name written out. This might be a problem for us because the Starbucks data uses a two-letter code of each country.
head(world_map)
## long lat group order region subregion
## 1 -69.89912 12.45200 1 1 Aruba <NA>
## 2 -69.89571 12.42300 1 2 Aruba <NA>
## 3 -69.94219 12.43853 1 3 Aruba <NA>
## 4 -70.00415 12.50049 1 4 Aruba <NA>
## 5 -70.06612 12.54697 1 5 Aruba <NA>
## 6 -70.05088 12.59707 1 6 Aruba <NA>
countrycode
Package#install.packages("countrycode")
library(countrycode)
starStores<-starbuck%>%
group_by(Country)%>%
summarise(stores=n())%>%
mutate(region=countrycode(Country, origin = 'iso2c', destination = 'country.name'))%>%
arrange(desc(stores))
head(starStores)
## # A tibble: 6 × 3
## Country stores region
## <chr> <int> <chr>
## 1 US 13608 United States
## 2 CN 2734 China
## 3 CA 1468 Canada
## 4 JP 1237 Japan
## 5 KR 993 South Korea
## 6 GB 901 United Kingdom
worldCoffee<-world_map%>%
left_join(starStores, by="region")
#install.packages("ggthemes")
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.2
worldCoffee%>%
ggplot(aes(map_id = region, fill=stores)) +
geom_map(map = world_map) +
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()
Something went wrong? Can you see it?
It turns out that only country name in maps
that doesn’t use the whole country name is the United States. In the map packages the name is USA. ##### Back to STEP 3
starStores<-starbuck%>%
group_by(Country)%>%
summarise(stores=n())%>%
mutate(region=as.character(lapply(Country, function(x){
if(x!="US"){
out=countrycode(x, origin = 'iso2c', destination = 'country.name')
}
if(x=="US"){
out="USA"
}
out})))
worldCoffee<-world_map%>%
left_join(starStores, by="region")
worldCoffee%>%
ggplot(aes(map_id = region, fill=stores)) +
geom_map(map = world_map) +
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()
Since the United States has so many Starbucks locations, its hard to see how many stores are in other countries.
worldCoffee%>%
ggplot(aes(map_id = region, fill=stores)) +
geom_map(map = world_map) +
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()+
scale_fill_gradient(trans = "log10")
worldCoffee%>%
ggplot(aes(map_id = region, fill=stores)) +
geom_map(map = world_map) +
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()+
scale_fill_gradient(trans = "log10")+
scale_fill_viridis(option="viridis", direction = 1)
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
# SPLIT THE RANGE INTO 5 EQUAL SECTIONS
split5<-quantile(starStores$stores, prob=c(0, .2, .4, .6, .8, 1))
split5
## 0% 20% 40% 60% 80% 100%
## 1.0 7.4 18.0 54.2 139.2 13608.0
str(split5)
## Named num [1:6] 1 7.4 18 54.2 139.2 ...
## - attr(*, "names")= chr [1:6] "0%" "20%" "40%" "60%" ...
# CUT INTO FIVE SECTIONS
starStores<-starStores%>%
mutate(store_cut = cut(stores, breaks = split5))
#install.packages("viridisLite")
library(viridisLite)
# CREATE A VIRIDIS COLOR PALETTE WITH 5 COLORS
viridis5<-viridis(n=5, begin = 0, end = 1, direction = 1)
viridis5
## [1] "#440154FF" "#3B528BFF" "#21908CFF" "#5DC863FF" "#FDE725FF"
# JOIN
worldCoffee<-world_map%>%
left_join(starStores, by="region")
# MAP
worldCoffee%>%
ggplot(aes(map_id = region, fill=store_cut)) +
geom_map(map = world_map) +
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()+
scale_fill_manual(values=viridis5)
Number of locations within the same city:
## CITIES
starCities<-starbuck%>%
group_by(Country, City)%>%
summarise(lat=mean(Latitude),
long=mean(Longitude),
cityStores=n())
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
Add points:
worldCoffee%>%
ggplot() +
geom_map(aes(map_id = region, fill=store_cut), map = world_map) +
geom_point(data=starCities, aes(x=long, y=lat, size=cityStores), alpha=.3)+
expand_limits(x = world_map$long, y = world_map$lat) +
coord_map("mollweide", xlim=c(-180,180))+
theme_map()+
scale_fill_manual(values=viridis5)+
guides(color = guide_legend(order=1),
size = guide_legend(order=2))+
theme(legend.position="bottom", legend.box = "vertical")
## Warning: Removed 1 rows containing missing values (geom_point).