Harold Nelson
2026-06-16
Load the Data OAW2309 and make the tidyverse available.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 30,075
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…
We want to create a new variable perm_date. It has the same month and day as DATE, but the year must be 2028.
There is a function make_date() in lubridate (part of the tidyverse). Look it up and use it to create perm_date.
## # A tibble: 6 × 8
## DATE PRCP TMAX TMIN mo dy yr perm_date
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl> <date>
## 1 1941-05-13 0 66 50 5 13 1941 2028-05-13
## 2 1941-05-14 0 63 47 5 14 1941 2028-05-14
## 3 1941-05-15 0.3 58 44 5 15 1941 2028-05-15
## 4 1941-05-16 1.08 55 45 5 16 1941 2028-05-16
## 5 1941-05-17 0.06 57 46 5 17 1941 2028-05-17
## 6 1941-05-18 0 59 39 5 18 1941 2028-05-18
Use group_by and summarize to create the median value of TMAX, med_TMAX, for each value of perm_date. Create the dataframe perm_summary.
Plot the values of med_TMAX against perm_date using geom_point().
Use plotly to make the plot interactive. This involves 2 steps.
Create a ggplot object by putting the call to ggplot on the right side of an assignment statement.
Call ggplotly() on the object from step 1.
Before you do this, you need to install and library the plotly package.
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Create a simple summary of med_TMAX.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 43.00 49.62 59.00 59.98 69.00 81.00
Use slice_max to identify the date when this variable reaches its maximum value.
## # A tibble: 1 × 2
## perm_date med_TMAX
## <date> <dbl>
## 1 2028-08-14 81
Find the number of days between January 1 and August 14, 2028 by subtracting the date values. Use make_date() to create start and end.
## Time difference of 225 days
Repeat for the difference between Aug 14 and Dec 31.
## Time difference of 139 days
Creat a new variable delta_TMAX as the difference between a day’s value of med_TMAX and the value on the previous day. Use lag().
Create a new variable direction. Its value is “Up” if the date is before August 14, otherwise it is “Down”. Use ifelse().
perm_summary = perm_summary %>%
mutate(direction = ifelse(perm_date< make_date(2028,8,14),"Up","Down"))
head(perm_summary)## # A tibble: 6 × 4
## perm_date med_TMAX delta_TMAX direction
## <date> <dbl> <dbl> <chr>
## 1 2028-01-01 44 NA Up
## 2 2028-01-02 44 0 Up
## 3 2028-01-03 44 0 Up
## 4 2028-01-04 44 0 Up
## 5 2028-01-05 44 0 Up
## 6 2028-01-06 44 0 Up
## # A tibble: 6 × 4
## perm_date med_TMAX delta_TMAX direction
## <date> <dbl> <dbl> <chr>
## 1 2028-12-26 43 0 Down
## 2 2028-12-27 44 1 Down
## 3 2028-12-28 44 0 Down
## 4 2028-12-29 43 -1 Down
## 5 2028-12-30 44 1 Down
## 6 2028-12-31 44 0 Down
Display the mean values of delta_TMAX for these two directions. Be careful. The first value of delta_TMAX is missing.
## # A tibble: 2 × 2
## direction mean_delta
## <chr> <dbl>
## 1 Down -0.236
## 2 Up 0.147
Use facet_wrap() and geom_density() to compare the values of delta_TMAX based on direction.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).
Repeat the last graph with geom_histogram().
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
Complete the following:
The median value of TMAX is typically rising from the beginning of the year until August 14. This is a period of 225 days.
It falls for the remainder of the year, a period of 139 days.
The rise is slower than the fall. During the rising period, the median TMAX increases by about .15 degrees per day. During the falling period, the temperature decreases by about .24 degrees per day.
The median value of TMAX is typically rising from the beginning of the year until August 14. This is a period of ___ days.
It falls for the remainder of the year, a period of ___ days.
The rise is slower than the fall. During the rising period, the median TMAX increases by about ___ degrees per day. During the falling period, the temperature decreases by about ___ degrees per day.