DATA151: Trends Over Time and Space

Learning Objectives

In this lesson students will learn how to create

Time series plots
Choropleths (colored map plots)

Time Series Plots

Time series plots show how a variable (on the y-axis) changes over time (on the x-axis).

Example 1: Salem, Oregon AQI

Step 0: Library Tidyverse

library(tidyverse)

Step 1: Load the Data

salem<- read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/salemOR_AQI.csv",
                 header=TRUE)

str(salem)

## 'data.frame':    2799 obs. of  4 variables:
##  $ date: Factor w/ 2799 levels "2014/1/1","2014/1/10",..: 2545 2551 2552 2553 2554 2555 2556 2557 2535 2536 ...
##  $ pm25: int  41 42 26 35 57 72 68 72 91 63 ...
##  $ pm10: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ o3  : int  33 27 12 NA NA NA NA NA NA NA ...

Step 2: `geom_line()`

Let’s just try using geom_line():

ggplot(salem, aes(date, pm25))+
  geom_line()

## Warning: Removed 20 row(s) containing missing values (geom_path).

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

What’s wrong with this?

The x-axis needs to be a date type variable

salem$date<-as.Date(salem$date)

ggplot(salem, aes(date, pm25))+
  geom_line()

## Warning: Removed 1 row(s) containing missing values (geom_path).

Step 3: Air Quality Ratings

We can do a little wrangling to add a column for air quality rating, as definited here:

https://aqicn.org/data-platform/register/

## AIR QUALITY
## a little wrangling
salem<-salem%>%
  mutate(quality=as.character(lapply(pm25, function(x){
    out=NA
    if(is.na(x)==FALSE){
    if(x %in% c(0:50)){
      out="Good"
    }
    if(x %in% c(51:100)){
      out="Moderate"
    }
    if(x %in% c(101:150)){
      out="Unhealthy Sensitive" # Unhealthy for Sensitive Groups
    }
    if(x %in% c(151:200)){
      out="Unhealthy"
    }
    if(x %in% c(201:300)){
      out="Very Unhealthy"
    }
    if(x > 300){
      out="Hazardous"
    }
    }
    out
  })))

Order the rating.

salem$quality<-factor(salem$quality, 
                      levels=c("Good", "Moderate", 
                               "Unhealthy Sensitive", "Unhealthy", 
                               "Very Unhealthy","Hazardous" ))

Step 4: Create a Custom Color Palette

pal<-c("forestgreen", "gold", "darkorange", "firebrick3", "purple3", "darkred")

## ADD POINTS 
ggplot(salem, aes(date, pm25))+
  geom_point(aes(color=quality))+
  geom_line()+
  scale_color_manual(values=pal)+
  theme_minimal()

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 1 row(s) containing missing values (geom_path).

Example 2: Cryptocurrency

Step 1: Load the Data

These data are in three separate files:

coin_Bitcoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Bitcoin.csv")
coin_Dogecoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Dogecoin.csv")
coin_Ethereum <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Ethereum.csv")

Step 2: Combine the data

coinBind<-coin_Bitcoin %>%
  rbind(coin_Dogecoin)%>%
  rbind(coin_Ethereum)

Step 3: Time Series Plot

Since Date is already a date type variable we can go ahead and plot it. Here color=Name works as a grouping variable.

#str(coinBind)

ggplot(coinBind, aes(x=Date, y=Volume, color=Name))+
  geom_line()

Choropleths (Map Plots)

Example 3: All Trails

Step 1: Load the Data

npark <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/AllTrails%20data%20-%20nationalpark.csv")

## Rows: 3313 Columns: 18

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, area_name, city_name, state_name, country_name, _geoloc, rou...
## dbl  (8): trail_id, popularity, length, elevation_gain, difficulty_rating, v...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#str(npark)

Step 2: State Level Data

Group by state to create summaries for metrics within a state.

stateNP<-npark%>%
  group_by(state_name)%>%
  summarise(stateTrails=n(), 
            avgPop=mean(popularity, na.rm=TRUE), 
            avgElev=mean(elevation_gain, na.rm=TRUE))

Step 3: `usmap` Package

#install.packages("usmap")
library(usmap)

## Warning: package 'usmap' was built under R version 3.6.2

states <- usmap::us_map()

head(states)

##         x        y order  hole piece group fips abbr    full
## 1 1091779 -1380695     1 FALSE     1  01.1   01   AL Alabama
## 2 1091268 -1376372     2 FALSE     1  01.1   01   AL Alabama
## 3 1091140 -1362998     3 FALSE     1  01.1   01   AL Alabama
## 4 1090940 -1343517     4 FALSE     1  01.1   01   AL Alabama
## 5 1090913 -1341006     5 FALSE     1  01.1   01   AL Alabama
## 6 1090796 -1334480     6 FALSE     1  01.1   01   AL Alabama

Let’s investigate the data for Oregon.

Points

oregon<-states%>%
  filter(full=="Oregon")

ggplot(oregon, aes(x, y))+
  geom_point()

These data allow us to play “connect the dots” to draw the shape of the state of Oregon.

Connect the dots

Oh no, what happened?

ggplot(oregon, aes(x, y))+
  geom_line()

We need to tell R what order to connect the dots.

geom_path() connects the observations in the order in which they appear in the data.
geom_line() connects them in order of the variable on the x axis.

ggplot(oregon, aes(x, y, group=group))+
  geom_path()

Filling in the space

We can actually think of geographies as generalized polygons!

ggplot(oregon, aes(x, y, group=group))+
  geom_polygon(fill="forestgreen")

Step 4: Join the Map and Data

When joining the data to the map we need to have the same variable name in both. Let’s create a new column named state_name.

stateNP_Map<-states%>%
  mutate(state_name=full)%>%
  left_join(stateNP)

## Joining, by = "state_name"

Step 5: Make a Map

stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = stateTrails),color="black")+
  theme_bw()+
  coord_equal()

STEP 6: Changing Color Palette

Viridis is a colorblind friendly color palette that can be used to create accessible heatmaps.

#install.packages("viridis")
library(viridis)

stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = stateTrails),color="black")+
  theme_bw()+
  coord_equal()+
  ggtitle("California has the MOST trails, but...")+
  scale_fill_viridis(option="viridis", direction = 1)

stateNP_Map%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(fill = avgPop),color="black")+
  theme_bw()+
  coord_equal()+
  ggtitle("..Oregon trails are the MOST popular")+
  scale_fill_viridis(option="viridis", direction = 1)

Your turn!

Create maps to show the distribution of…

Average elevation by state
Average trail length by state

DATA151: Trends Over Time and Space

Kitada Smalley

Learning Objectives

Time Series Plots

Example 1: Salem, Oregon AQI

Step 0: Library Tidyverse

Step 1: Load the Data

Step 2: geom_line()

Step 3: Air Quality Ratings

Step 4: Create a Custom Color Palette

Example 2: Cryptocurrency

Step 1: Load the Data

Step 2: Combine the data

Step 3: Time Series Plot

Choropleths (Map Plots)

Example 3: All Trails

Step 1: Load the Data

Step 2: State Level Data

Step 3: usmap Package

Points

Connect the dots

Filling in the space

Step 4: Join the Map and Data

Step 5: Make a Map

STEP 6: Changing Color Palette

Your turn!

Step 2: `geom_line()`

Step 3: `usmap` Package