Setup

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

load("cdc.Rdata")
load("OAW.Rdata")

Task 1

Starting with the OAW dataframe, create a new dataframe TMAXStats. There is one row for every combination of mo and dy. Each row contains the mean, median, and sd of TMAX for that day. Do a glimpse() and head() to verify your results.

Solution

TMAXStats = OAW %>% 
  group_by(mo,dy) %>% 
  summarize(mean_stat = mean(TMAX),
            median_stat = median(TMAX),
            sd_stat = sd(TMAX),
            count_stat = n()) %>% 
  ungroup()

## `summarise()` has grouped output by 'mo'. You can override using the `.groups`
## argument.

head(TMAXStats)

## # A tibble: 6 × 6
##   mo    dy    mean_stat median_stat sd_stat count_stat
##   <fct> <fct>     <dbl>       <dbl>   <dbl>      <int>
## 1 1     1          43.6          44    5.77         81
## 2 1     2          43.2          44    6.42         81
## 3 1     3          43.4          44    6.61         81
## 4 1     4          43.5          44    6.76         81
## 5 1     5          43.6          44    5.93         81
## 6 1     6          43.4          44    6.25         81

glimpse(TMAXStats)

## Rows: 366
## Columns: 6
## $ mo          <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dy          <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ mean_stat   <dbl> 43.60494, 43.22222, 43.39506, 43.48148, 43.62963, 43.39506…
## $ median_stat <dbl> 44.0, 44.0, 44.0, 44.0, 44.0, 44.0, 44.0, 46.0, 44.0, 45.0…
## $ sd_stat     <dbl> 5.767753, 6.416775, 6.609991, 6.760383, 5.927572, 6.248358…
## $ count_stat  <int> 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81…

Task 2

Is there any relationship betweem the mean and the standard deviation? Do a scatterplot and see what you think.

Solution

TMAXStats %>% 
  ggplot(aes(x = mean_stat, y = sd_stat)) +
  geom_point() +
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Task 3

Convert TMAXStats to a longer form TMAXLong. Select all the calculated stats column for pivoting. Do this three ways.

Name the column containing the names of the statistic “statistic”. Name the column containing the values of the statistics “value”.

Solution 1

TMAXLong = TMAXStats %>% 
  pivot_longer(cols = ends_with("_stat"),
               names_to = "statistic",
               values_to = "value")

head(TMAXLong)

## # A tibble: 6 × 4
##   mo    dy    statistic   value
##   <fct> <fct> <chr>       <dbl>
## 1 1     1     mean_stat   43.6 
## 2 1     1     median_stat 44   
## 3 1     1     sd_stat      5.77
## 4 1     1     count_stat  81   
## 5 1     2     mean_stat   43.2 
## 6 1     2     median_stat 44

Solution 2

TMAXLong = TMAXStats %>% 
  pivot_longer(cols = -c(mo,dy),
               names_to = "statistic",
               values_to = "value")

head(TMAXLong)

## # A tibble: 6 × 4
##   mo    dy    statistic   value
##   <fct> <fct> <chr>       <dbl>
## 1 1     1     mean_stat   43.6 
## 2 1     1     median_stat 44   
## 3 1     1     sd_stat      5.77
## 4 1     1     count_stat  81   
## 5 1     2     mean_stat   43.2 
## 6 1     2     median_stat 44

Solution 3

TMAXLong = TMAXStats %>% 
  pivot_longer(cols = mean_stat:count_stat,
               names_to = "statistic",
               values_to = "value")

head(TMAXLong)

## # A tibble: 6 × 4
##   mo    dy    statistic   value
##   <fct> <fct> <chr>       <dbl>
## 1 1     1     mean_stat   43.6 
## 2 1     1     median_stat 44   
## 3 1     1     sd_stat      5.77
## 4 1     1     count_stat  81   
## 5 1     2     mean_stat   43.2 
## 6 1     2     median_stat 44

Task 4

Convert TMAXLong to TMAXWide

TMAXWide = TMAXLong %>% 
  pivot_wider(names_from = "statistic",
              values_from = "value")

head(TMAXWide)

## # A tibble: 6 × 6
##   mo    dy    mean_stat median_stat sd_stat count_stat
##   <fct> <fct>     <dbl>       <dbl>   <dbl>      <dbl>
## 1 1     1          43.6          44    5.77         81
## 2 1     2          43.2          44    6.42         81
## 3 1     3          43.4          44    6.61         81
## 4 1     4          43.5          44    6.76         81
## 5 1     5          43.6          44    5.93         81
## 6 1     6          43.4          44    6.25         81

Pivot

Setup

Task 1

Solution

Task 2

Solution

Task 3

Solution 1

Solution 2

Solution 3

Task 4