These exercises accompany the Reshaping Data tutorial: http://rpubs.com/NateByers/Reshaping. The exercises use data frames from the region5air library. Run the following code to clean out your global environment and load the data you need:
rm(list = ls())
library(tidyr)
library(dplyr)
library(region5air)
data(airdata)
data(chicago_air)
chicago_air data frame is in a wide format. Use gather() to make a long data frame named chicago_air_long.airdata data frame is in a long format. Use the filter() function to create a data frame called site22. Filter down to site “840180890022” and a poc of 1 (remember to use ==). Use the select() function to select only the “datetime”, “parameter”, and “value” columns. Use spread() on site22 to make a wide data frame called site22_wide with separate columns for each parameter. Hint: you want to spread the “parameter” column, so identify that column as the key in the spread() function. The “value” column should be identified as the value in the function.filter() function on airdata to create a data frame called pm25. Filter down to parameter “88101”. Use the select() function to select only the “datetime”, “site”, and “value” columns. Use spread() on pm25 to make a wide data frame called pm25_wide with separate columns for each site. Hint: you want to spread the “site” column, so identify that column as the key in the spread() function.ggplot2 to plot the chicago_air_long data frame that was created in exercise 1. First make sure to convert the “date” column to a Date class using as.Date(). Use facet_grid() in the plot to make separate facets for each parameter, and be sure to set the scales to “free”.chicago_air_long <- gather(chicago_air, key = "parameter", value = "value",
ozone:solar)
head(chicago_air_long)
## date month weekday parameter value
## 1 2013-01-01 1 3 ozone 0.032
## 2 2013-01-02 1 4 ozone 0.020
## 3 2013-01-03 1 5 ozone 0.021
## 4 2013-01-04 1 6 ozone 0.028
## 5 2013-01-05 1 7 ozone 0.025
## 6 2013-01-06 1 1 ozone 0.026
site22 <- filter(airdata, site == "840180890022", poc == 1)
site22 <- select(site22, datetime, parameter, value)
site22_wide <- spread(site22, key = "parameter", value = "value")
head(site22_wide)
## datetime 44201 62101
## 1 20130101T0100-0600 NA 24
## 2 20130101T0200-0600 NA 24
## 3 20130101T0300-0600 NA 24
## 4 20130101T0400-0600 NA 24
## 5 20130101T0500-0600 NA 23
## 6 20130101T0600-0600 NA 22
pm25 <- filter(airdata, parameter == "88101")
pm25 <- select(pm25, datetime, site, value)
pm25_wide <- spread(pm25, key = "site", value = "value")
head(pm25_wide)
## datetime 840180890022 840180892004 840181270024
## 1 20130101T0000-0600 16.2 18.4 10.6
## 2 20130101T0100-0600 15.7 15.2 11.2
## 3 20130101T0200-0600 22.5 15.1 11.0
## 4 20130101T0300-0600 16.5 13.1 10.0
## 5 20130101T0400-0600 22.9 10.6 7.6
## 6 20130101T0500-0600 28.8 9.5 11.4
library(ggplot2)
chicago_air_long$date <- as.Date(chicago_air_long$date)
ggplot(chicago_air_long, aes(date, value)) +
geom_point() + facet_grid(parameter ~ ., scales = "free")