These exercises accompany the Reshaping Data tutorial: http://rpubs.com/NateByers/Reshaping. The exercises use data frames from the region5air library. Run the following code to clean out your global environment and load the data you need:

rm(list = ls())
library(tidyr)
library(dplyr)
library(region5air)
data(airdata)
data(chicago_air)

Exercises

  1. The chicago_air data frame is in a wide format. Use gather() to make a long data frame named chicago_air_long.

Solution 1

  1. The airdata data frame is in a long format. Use the filter() function to create a data frame called site22. Filter down to site “840180890022” and a poc of 1 (remember to use ==). Use the select() function to select only the “datetime”, “parameter”, and “value” columns. Use spread() on site22 to make a wide data frame called site22_wide with separate columns for each parameter. Hint: you want to spread the “parameter” column, so identify that column as the key in the spread() function. The “value” column should be identified as the value in the function.

Solution 2

  1. Use the filter() function on airdata to create a data frame called pm25. Filter down to parameter “88101”. Use the select() function to select only the “datetime”, “site”, and “value” columns. Use spread() on pm25 to make a wide data frame called pm25_wide with separate columns for each site. Hint: you want to spread the “site” column, so identify that column as the key in the spread() function.

Solution 3


Advanced Exercises

  1. Use ggplot2 to plot the chicago_air_long data frame that was created in exercise 1. First make sure to convert the “date” column to a Date class using as.Date(). Use facet_grid() in the plot to make separate facets for each parameter, and be sure to set the scales to “free”.

Solution 4


Solutions

Solution 1

chicago_air_long <- gather(chicago_air, key = "parameter", value = "value", 
                           ozone:solar)
head(chicago_air_long)
##         date month weekday parameter value
## 1 2013-01-01     1       3     ozone 0.032
## 2 2013-01-02     1       4     ozone 0.020
## 3 2013-01-03     1       5     ozone 0.021
## 4 2013-01-04     1       6     ozone 0.028
## 5 2013-01-05     1       7     ozone 0.025
## 6 2013-01-06     1       1     ozone 0.026

Back to exercises

Solution 2

site22 <- filter(airdata, site == "840180890022", poc == 1)
site22 <- select(site22, datetime, parameter, value)
site22_wide <- spread(site22, key = "parameter", value = "value")
head(site22_wide)
##             datetime 44201 62101
## 1 20130101T0100-0600    NA    24
## 2 20130101T0200-0600    NA    24
## 3 20130101T0300-0600    NA    24
## 4 20130101T0400-0600    NA    24
## 5 20130101T0500-0600    NA    23
## 6 20130101T0600-0600    NA    22

Back to exercises

Solution 3

pm25 <- filter(airdata, parameter == "88101")
pm25 <- select(pm25, datetime, site, value)
pm25_wide <- spread(pm25, key = "site", value = "value")
head(pm25_wide)
##             datetime 840180890022 840180892004 840181270024
## 1 20130101T0000-0600         16.2         18.4         10.6
## 2 20130101T0100-0600         15.7         15.2         11.2
## 3 20130101T0200-0600         22.5         15.1         11.0
## 4 20130101T0300-0600         16.5         13.1         10.0
## 5 20130101T0400-0600         22.9         10.6          7.6
## 6 20130101T0500-0600         28.8          9.5         11.4

Back to exercises

Solution 4

library(ggplot2)
chicago_air_long$date <- as.Date(chicago_air_long$date)
ggplot(chicago_air_long, aes(date, value)) + 
  geom_point() + facet_grid(parameter ~ ., scales = "free")

Back to exercises