Learning Objectives

In this lesson students will learn how to create

  • Time series plots
  • Choropleths (colored map plots)

Time Series Plots

Time series plots show how a variable (on the y-axis) changes over time (on the x-axis).

Example 1: Salem, Oregon AQI

Step 0: Library Tidyverse
library(tidyverse)
Step 1: Load the Data
salem<- read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/salemOR_AQI.csv",
                 header=TRUE)

str(salem)
## 'data.frame':    2799 obs. of  4 variables:
##  $ date: Factor w/ 2799 levels "2014/1/1","2014/1/10",..: 2545 2551 2552 2553 2554 2555 2556 2557 2535 2536 ...
##  $ pm25: int  41 42 26 35 57 72 68 72 91 63 ...
##  $ pm10: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ o3  : int  33 27 12 NA NA NA NA NA NA NA ...
Step 2: geom_line()

Let’s just try using geom_line():

ggplot(salem, aes(date, pm25))+
  geom_line()
## Warning: Removed 20 row(s) containing missing values (geom_path).
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

What’s wrong with this?

  • The x-axis needs to be a date type variable
salem$date<-as.Date(salem$date)

ggplot(salem, aes(date, pm25))+
  geom_line()
## Warning: Removed 1 row(s) containing missing values (geom_path).

Step 3: Air Quality Ratings

We can do a little wrangling to add a column for air quality rating, as definited here:

https://aqicn.org/data-platform/register/

## AIR QUALITY
## a little wrangling
salem<-salem%>%
  mutate(quality=as.character(lapply(pm25, function(x){
    out=NA
    if(is.na(x)==FALSE){
    if(x %in% c(0:50)){
      out="Good"
    }
    if(x %in% c(51:100)){
      out="Moderate"
    }
    if(x %in% c(101:150)){
      out="Unhealthy Sensitive" # Unhealthy for Sensitive Groups
    }
    if(x %in% c(151:200)){
      out="Unhealthy"
    }
    if(x %in% c(201:300)){
      out="Very Unhealthy"
    }
    if(x > 300){
      out="Hazardous"
    }
    }
    out
  })))

Order the rating.

salem$quality<-factor(salem$quality, 
                      levels=c("Good", "Moderate", 
                               "Unhealthy Sensitive", "Unhealthy", 
                               "Very Unhealthy","Hazardous" ))
Step 4: Create a Custom Color Palette
pal<-c("forestgreen", "gold", "darkorange", "firebrick3", "purple3", "darkred")

## ADD POINTS 
ggplot(salem, aes(date, pm25))+
  geom_point(aes(color=quality))+
  geom_line()+
  scale_color_manual(values=pal)+
  theme_minimal()
## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).

Example 2: Cryptocurrency

Step 1: Load the Data

These data are in three separate files:

coin_Bitcoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Bitcoin.csv")
coin_Dogecoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Dogecoin.csv")
coin_Ethereum <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Ethereum.csv")
Step 2: Combine the data
coinBind<-coin_Bitcoin %>%
  rbind(coin_Dogecoin)%>%
  rbind(coin_Ethereum)
Step 3: Time Series Plot

Since Date is already a date type variable we can go ahead and plot it. Here color=Name works as a grouping variable.

#str(coinBind)

ggplot(coinBind, aes(x=Date, y=Volume, color=Name))+
  geom_line()

Choropleths (Map Plots)

Example 3: All Trails

Step 1: Load the Data
npark <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/AllTrails%20data%20-%20nationalpark.csv")
## Rows: 3313 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, area_name, city_name, state_name, country_name, _geoloc, rou...
## dbl  (8): trail_id, popularity, length, elevation_gain, difficulty_rating, v...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#str(npark)
Step 2: State Level Data

Group by state to create summaries for metrics within a state.

stateNP<-npark%>%
  group_by(state_name)%>%
  summarise(stateTrails=n(), 
            avgPop=mean(popularity, na.rm=TRUE), 
            avgElev=mean(elevation_gain, na.rm=TRUE))
Step 3: usmap Package
#install.packages("usmap")
library(usmap)
## Warning: package 'usmap' was built under R version 3.6.2
states <- usmap::us_map()

head(states)
##         x        y order  hole piece group fips abbr    full
## 1 1091779 -1380695     1 FALSE     1  01.1   01   AL Alabama
## 2 1091268 -1376372     2 FALSE     1  01.1   01   AL Alabama
## 3 1091140 -1362998     3 FALSE     1  01.1   01   AL Alabama
## 4 1090940 -1343517     4 FALSE     1  01.1   01   AL Alabama
## 5 1090913 -1341006     5 FALSE     1  01.1   01   AL Alabama
## 6 1090796 -1334480     6 FALSE     1  01.1   01   AL Alabama

Let’s investigate the data for Oregon.

Points
oregon<-states%>%
  filter(full=="Oregon")

ggplot(oregon, aes(x, y))+
  geom_point()

These data allow us to play “connect the dots” to draw the shape of the state of Oregon.

Connect the dots

Oh no, what happened?

ggplot(oregon, aes(x, y))+
  geom_line()

We need to tell R what order to connect the dots.

  • geom_path() connects the observations in the order in which they appear in the data.

  • geom_line() connects them in order of the variable on the x axis.

ggplot(oregon, aes(x, y, group=group))+
  geom_path() 

Filling in the space

We can actually think of geographies as generalized polygons!

ggplot(oregon, aes(x, y, group=group))+
  geom_polygon(fill="forestgreen") 

Step 4: Join the Map and Data

When joining the data to the map we need to have the same variable name in both. Let’s create a new column named state_name.

stateNP_Map<-states%>%
  mutate(state_name=full)%>%
  left_join(stateNP)
## Joining, by = "state_name"
Step 5: Make a Map
stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = stateTrails),color="black")+
  theme_bw()+
  coord_equal()

STEP 6: Changing Color Palette

Viridis is a colorblind friendly color palette that can be used to create accessible heatmaps.

#install.packages("viridis")
library(viridis)

stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = stateTrails),color="black")+
  theme_bw()+
  coord_equal()+
  ggtitle("California has the MOST trails, but...")+
  scale_fill_viridis(option="viridis", direction = 1)

stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = avgPop),color="black")+
  theme_bw()+
  coord_equal()+
  ggtitle("..Oregon trails are the MOST popular")+
  scale_fill_viridis(option="viridis", direction = 1)

Your turn!

Create maps to show the distribution of…

  • Average elevation by state
  • Average trail length by state