Harold Nelson
2023-02-06
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Starting with the OAW dataframe, create a new dataframe TMAXStats. There is one row for every combination of mo and dy. Each row contains the mean, median, and sd of TMAX for that day. Do a glimpse() and head() to verify your results.
TMAXStats = OAW %>%
group_by(mo,dy) %>%
summarize(mean_stat = mean(TMAX),
median_stat = median(TMAX),
sd_stat = sd(TMAX),
count_stat = n()) %>%
ungroup()
## `summarise()` has grouped output by 'mo'. You can override using the `.groups`
## argument.
## # A tibble: 6 × 6
## mo dy mean_stat median_stat sd_stat count_stat
## <fct> <fct> <dbl> <dbl> <dbl> <int>
## 1 1 1 43.6 44 5.77 81
## 2 1 2 43.2 44 6.42 81
## 3 1 3 43.4 44 6.61 81
## 4 1 4 43.5 44 6.76 81
## 5 1 5 43.6 44 5.93 81
## 6 1 6 43.4 44 6.25 81
## Rows: 366
## Columns: 6
## $ mo <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dy <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ mean_stat <dbl> 43.60494, 43.22222, 43.39506, 43.48148, 43.62963, 43.39506…
## $ median_stat <dbl> 44.0, 44.0, 44.0, 44.0, 44.0, 44.0, 44.0, 46.0, 44.0, 45.0…
## $ sd_stat <dbl> 5.767753, 6.416775, 6.609991, 6.760383, 5.927572, 6.248358…
## $ count_stat <int> 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81…
Is there any relationship betweem the mean and the standard deviation? Do a scatterplot and see what you think.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Convert TMAXStats to a longer form TMAXLong. Select all the calculated stats column for pivoting. Do this three ways.
Name the column containing the names of the statistic “statistic”. Name the column containing the values of the statistics “value”.
TMAXLong = TMAXStats %>%
pivot_longer(cols = ends_with("_stat"),
names_to = "statistic",
values_to = "value")
head(TMAXLong)
## # A tibble: 6 × 4
## mo dy statistic value
## <fct> <fct> <chr> <dbl>
## 1 1 1 mean_stat 43.6
## 2 1 1 median_stat 44
## 3 1 1 sd_stat 5.77
## 4 1 1 count_stat 81
## 5 1 2 mean_stat 43.2
## 6 1 2 median_stat 44
TMAXLong = TMAXStats %>%
pivot_longer(cols = -c(mo,dy),
names_to = "statistic",
values_to = "value")
head(TMAXLong)
## # A tibble: 6 × 4
## mo dy statistic value
## <fct> <fct> <chr> <dbl>
## 1 1 1 mean_stat 43.6
## 2 1 1 median_stat 44
## 3 1 1 sd_stat 5.77
## 4 1 1 count_stat 81
## 5 1 2 mean_stat 43.2
## 6 1 2 median_stat 44
TMAXLong = TMAXStats %>%
pivot_longer(cols = mean_stat:count_stat,
names_to = "statistic",
values_to = "value")
head(TMAXLong)
## # A tibble: 6 × 4
## mo dy statistic value
## <fct> <fct> <chr> <dbl>
## 1 1 1 mean_stat 43.6
## 2 1 1 median_stat 44
## 3 1 1 sd_stat 5.77
## 4 1 1 count_stat 81
## 5 1 2 mean_stat 43.2
## 6 1 2 median_stat 44
Convert TMAXLong to TMAXWide
## # A tibble: 6 × 6
## mo dy mean_stat median_stat sd_stat count_stat
## <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 43.6 44 5.77 81
## 2 1 2 43.2 44 6.42 81
## 3 1 3 43.4 44 6.61 81
## 4 1 4 43.5 44 6.76 81
## 5 1 5 43.6 44 5.93 81
## 6 1 6 43.4 44 6.25 81