The Seasonality of Olympia Weather

Harold Nelson

2026-06-16

Task

Load the Data OAW2309 and make the tidyverse available.

Solution

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
load("OAW2309.Rdata")
glimpse(OAW2309)
## Rows: 30,075
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo   <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy   <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr   <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…

Task

We want to create a new variable perm_date. It has the same month and day as DATE, but the year must be 2028.

There is a function make_date() in lubridate (part of the tidyverse). Look it up and use it to create perm_date.

Solution

OAW2309 = OAW2309 %>% 
  mutate(perm_date = make_date(2028,mo,dy))
head(OAW2309)
## # A tibble: 6 × 8
##   DATE        PRCP  TMAX  TMIN mo       dy    yr perm_date 
##   <date>     <dbl> <dbl> <dbl> <fct> <int> <dbl> <date>    
## 1 1941-05-13  0       66    50 5        13  1941 2028-05-13
## 2 1941-05-14  0       63    47 5        14  1941 2028-05-14
## 3 1941-05-15  0.3     58    44 5        15  1941 2028-05-15
## 4 1941-05-16  1.08    55    45 5        16  1941 2028-05-16
## 5 1941-05-17  0.06    57    46 5        17  1941 2028-05-17
## 6 1941-05-18  0       59    39 5        18  1941 2028-05-18

Task

Use group_by and summarize to create the median value of TMAX, med_TMAX, for each value of perm_date. Create the dataframe perm_summary.

Solution

perm_summary = OAW2309 %>% 
  group_by(perm_date) %>% 
  summarize(med_TMAX = median(TMAX))

Task

Plot the values of med_TMAX against perm_date using geom_point().

Solution

perm_summary %>% 
  ggplot(aes(perm_date,med_TMAX)) +
  geom_point() 

Task

Use plotly to make the plot interactive. This involves 2 steps.

  1. Create a ggplot object by putting the call to ggplot on the right side of an assignment statement.

  2. Call ggplotly() on the object from step 1.

Before you do this, you need to install and library the plotly package.

Solution

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
graph = perm_summary %>% 
  ggplot(aes(perm_date,med_TMAX)) +
  geom_point() 

ggplotly(graph)

Task

Create a simple summary of med_TMAX.

Solution

summary(perm_summary$med_TMAX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   43.00   49.62   59.00   59.98   69.00   81.00

Task

Use slice_max to identify the date when this variable reaches its maximum value.

Solution

max_val = perm_summary %>% 
  slice_max(med_TMAX,n = 1)

max_val
## # A tibble: 1 × 2
##   perm_date  med_TMAX
##   <date>        <dbl>
## 1 2028-08-14       81

Task

Find the number of days between January 1 and August 14, 2028 by subtracting the date values. Use make_date() to create start and end.

Solution

start = make_date(2028,1,2)
end = make_date(2028,8,14)
end - start
## Time difference of 225 days

Task

Repeat for the difference between Aug 14 and Dec 31.

Solution

start = make_date(2028,8,14)
end = make_date(2028,12,31)
end - start
## Time difference of 139 days

Task

Creat a new variable delta_TMAX as the difference between a day’s value of med_TMAX and the value on the previous day. Use lag().

Solution

perm_summary = perm_summary %>% 
  mutate(delta_TMAX = med_TMAX - lag(med_TMAX))

Task

Create a new variable direction. Its value is “Up” if the date is before August 14, otherwise it is “Down”. Use ifelse().

Solution

perm_summary = perm_summary %>% 
  mutate(direction = ifelse(perm_date< make_date(2028,8,14),"Up","Down"))

head(perm_summary)
## # A tibble: 6 × 4
##   perm_date  med_TMAX delta_TMAX direction
##   <date>        <dbl>      <dbl> <chr>    
## 1 2028-01-01       44         NA Up       
## 2 2028-01-02       44          0 Up       
## 3 2028-01-03       44          0 Up       
## 4 2028-01-04       44          0 Up       
## 5 2028-01-05       44          0 Up       
## 6 2028-01-06       44          0 Up
tail(perm_summary)
## # A tibble: 6 × 4
##   perm_date  med_TMAX delta_TMAX direction
##   <date>        <dbl>      <dbl> <chr>    
## 1 2028-12-26       43          0 Down     
## 2 2028-12-27       44          1 Down     
## 3 2028-12-28       44          0 Down     
## 4 2028-12-29       43         -1 Down     
## 5 2028-12-30       44          1 Down     
## 6 2028-12-31       44          0 Down

Task

Display the mean values of delta_TMAX for these two directions. Be careful. The first value of delta_TMAX is missing.

Solution

perm_summary %>% 
  group_by(direction) %>% 
  summarize(mean_delta = mean(delta_TMAX,na.rm = TRUE))
## # A tibble: 2 × 2
##   direction mean_delta
##   <chr>          <dbl>
## 1 Down          -0.236
## 2 Up             0.147

Task

Use facet_wrap() and geom_density() to compare the values of delta_TMAX based on direction.

Solution

perm_summary %>% 
  ggplot(aes(x = delta_TMAX)) +
  geom_density() +
  facet_wrap(~direction,ncol = 1)
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).

Task

Repeat the last graph with geom_histogram().

Solution

perm_summary %>% 
  ggplot(aes(x = delta_TMAX)) +
  geom_histogram() +
  facet_wrap(~direction,ncol = 1)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

Wrapping Up

Complete the following:

The median value of TMAX is typically rising from the beginning of the year until August 14. This is a period of 225 days.

It falls for the remainder of the year, a period of 139 days.

The rise is slower than the fall. During the rising period, the median TMAX increases by about .15 degrees per day. During the falling period, the temperature decreases by about .24 degrees per day.

Solution

The median value of TMAX is typically rising from the beginning of the year until August 14. This is a period of ___ days.

It falls for the remainder of the year, a period of ___ days.

The rise is slower than the fall. During the rising period, the median TMAX increases by about ___ degrees per day. During the falling period, the temperature decreases by about ___ degrees per day.