In this lesson students will learn how to create
Time series plots show how a variable (on the y-axis) changes over time (on the x-axis).
library(tidyverse)
salem<- read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/salemOR_AQI.csv",
header=TRUE)
str(salem)
## 'data.frame': 2799 obs. of 4 variables:
## $ date: Factor w/ 2799 levels "2014/1/1","2014/1/10",..: 2545 2551 2552 2553 2554 2555 2556 2557 2535 2536 ...
## $ pm25: int 41 42 26 35 57 72 68 72 91 63 ...
## $ pm10: int NA NA NA NA NA NA NA NA NA NA ...
## $ o3 : int 33 27 12 NA NA NA NA NA NA NA ...
geom_line()
Let’s just try using geom_line()
:
ggplot(salem, aes(date, pm25))+
geom_line()
## Warning: Removed 20 row(s) containing missing values (geom_path).
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
What’s wrong with this?
salem$date<-as.Date(salem$date)
ggplot(salem, aes(date, pm25))+
geom_line()
## Warning: Removed 1 row(s) containing missing values (geom_path).
We can do a little wrangling to add a column for air quality rating, as definited here:
https://aqicn.org/data-platform/register/
## AIR QUALITY
## a little wrangling
salem<-salem%>%
mutate(quality=as.character(lapply(pm25, function(x){
out=NA
if(is.na(x)==FALSE){
if(x %in% c(0:50)){
out="Good"
}
if(x %in% c(51:100)){
out="Moderate"
}
if(x %in% c(101:150)){
out="Unhealthy Sensitive" # Unhealthy for Sensitive Groups
}
if(x %in% c(151:200)){
out="Unhealthy"
}
if(x %in% c(201:300)){
out="Very Unhealthy"
}
if(x > 300){
out="Hazardous"
}
}
out
})))
Order the rating.
salem$quality<-factor(salem$quality,
levels=c("Good", "Moderate",
"Unhealthy Sensitive", "Unhealthy",
"Very Unhealthy","Hazardous" ))
pal<-c("forestgreen", "gold", "darkorange", "firebrick3", "purple3", "darkred")
## ADD POINTS
ggplot(salem, aes(date, pm25))+
geom_point(aes(color=quality))+
geom_line()+
scale_color_manual(values=pal)+
theme_minimal()
## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).
These data are in three separate files:
coin_Bitcoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Bitcoin.csv")
coin_Dogecoin <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Dogecoin.csv")
coin_Ethereum <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/coin_Ethereum.csv")
coinBind<-coin_Bitcoin %>%
rbind(coin_Dogecoin)%>%
rbind(coin_Ethereum)
Since Date
is already a date type variable we can go ahead and plot it. Here color=Name
works as a grouping variable.
#str(coinBind)
ggplot(coinBind, aes(x=Date, y=Volume, color=Name))+
geom_line()
npark <- read_csv("https://raw.githubusercontent.com/kitadasmalley/DATA151/main/Data/AllTrails%20data%20-%20nationalpark.csv")
## Rows: 3313 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, area_name, city_name, state_name, country_name, _geoloc, rou...
## dbl (8): trail_id, popularity, length, elevation_gain, difficulty_rating, v...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#str(npark)
Group by state to create summaries for metrics within a state.
stateNP<-npark%>%
group_by(state_name)%>%
summarise(stateTrails=n(),
avgPop=mean(popularity, na.rm=TRUE),
avgElev=mean(elevation_gain, na.rm=TRUE))
usmap
Package#install.packages("usmap")
library(usmap)
## Warning: package 'usmap' was built under R version 3.6.2
states <- usmap::us_map()
head(states)
## x y order hole piece group fips abbr full
## 1 1091779 -1380695 1 FALSE 1 01.1 01 AL Alabama
## 2 1091268 -1376372 2 FALSE 1 01.1 01 AL Alabama
## 3 1091140 -1362998 3 FALSE 1 01.1 01 AL Alabama
## 4 1090940 -1343517 4 FALSE 1 01.1 01 AL Alabama
## 5 1090913 -1341006 5 FALSE 1 01.1 01 AL Alabama
## 6 1090796 -1334480 6 FALSE 1 01.1 01 AL Alabama
Let’s investigate the data for Oregon.
oregon<-states%>%
filter(full=="Oregon")
ggplot(oregon, aes(x, y))+
geom_point()
These data allow us to play “connect the dots” to draw the shape of the state of Oregon.
Oh no, what happened?
ggplot(oregon, aes(x, y))+
geom_line()
We need to tell R what order to connect the dots.
geom_path()
connects the observations in the order in which they appear in the data.
geom_line()
connects them in order of the variable on the x axis.
ggplot(oregon, aes(x, y, group=group))+
geom_path()
We can actually think of geographies as generalized polygons!
ggplot(oregon, aes(x, y, group=group))+
geom_polygon(fill="forestgreen")
When joining the data to the map we need to have the same variable name in both. Let’s create a new column named state_name
.
stateNP_Map<-states%>%
mutate(state_name=full)%>%
left_join(stateNP)
## Joining, by = "state_name"
stateNP_Map%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = stateTrails),color="black")+
theme_bw()+
coord_equal()
Viridis is a colorblind friendly color palette that can be used to create accessible heatmaps.
#install.packages("viridis")
library(viridis)
stateNP_Map%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = stateTrails),color="black")+
theme_bw()+
coord_equal()+
ggtitle("California has the MOST trails, but...")+
scale_fill_viridis(option="viridis", direction = 1)
stateNP_Map%>%
ggplot(aes(x, y, group = group)) +
geom_polygon(aes(fill = avgPop),color="black")+
theme_bw()+
coord_equal()+
ggtitle("..Oregon trails are the MOST popular")+
scale_fill_viridis(option="viridis", direction = 1)
Create maps to show the distribution of…