Weather 3

Harold Nelson

2025-10-16

Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(suncalc)
library(plotly)
load("OAW2509.Rdata")

Pleasant Days

I’ll define a day as pleasant if there is no rain and the maximum temperatur is between 60 and 80.

Add a boolean variable pleasant to OAW2509.

Compute the proportion of pleasant days in the entire dataframe.

Solution

OAW2509 = OAW2509 %>% 
  mutate(pleasant = PRCP == 0 &
         TMAX >= 60 & 
         TMAX <= 80) 
  
OAW2509 %>% 
  summarize(prop_pleasant = mean(pleasant))
## # A tibble: 1 × 1
##   prop_pleasant
##           <dbl>
## 1         0.294

Seasonality of pleasant

Create a summary file showing the proportion of pleasant days in each calendar month.

Use this to create an interactive scatterplot showing the values.

Solution

pleasant_by_month = OAW2509 %>% 
  group_by(mo) %>% 
  summarize(prop_pleasant = mean(pleasant))

g = pleasant_by_month %>% 
  ggplot(aes(x = mo, y = prop_pleasant)) +
  geom_point()

ggplotly(g)

Comment

Solution

Pleasantness rises from January to June.

It drops slightly for July and August.

The most pleasant month of the entire year is September with June a close second.

Misery

I’ll define misery as a cold, wet, dark day.

We can measure coldness and wetness from the data in OAW2509.

For darkness, we need the number of daylight hours. I posed the following question to ChatGPT.

You are an R expert. I want a function to give the hours of daylight for a given date in Olympia, WA.

There was an error in the code supplied by ChatGPT. It included an extra parameter, keep, in the call to getSunlightTimes.

# Need to install suncalc
library(suncalc)

daylight_hours <- function(date) {
  # Ensure input is a Date object
  date <- as.Date(date)
  
  # Coordinates for Olympia, WA
  lat <- 47.04
  lon <- -122.90
  
  # Get sunrise and sunset times in UTC and convert to local time
  sun_times <- getSunlightTimes(date = date, lat = lat, lon = lon,  tz = "America/Los_Angeles")
  
  # Calculate daylight duration in hours
  daylight_duration <- as.numeric(difftime(sun_times$sunset, sun_times$sunrise, units = "hours"))
  
  return(daylight_duration)
}

Calculate Mean Measures by Day

Construct a dataframe means as follows:

Create the variable pdate using make_date to set all dates to 2028.

group by pdate.

Calculate the means of TMAX, TMIN, and PRCP as m_…

Add the variable sun for each pdate using the function getSunlightTimes.

Solution

means = OAW2509 %>% 
  mutate(pdate = make_date(year = 2028,
                           month = mo,
                           day = dy)) %>% 
  group_by(pdate) %>% 
  summarize(m_PRCP = mean(PRCP),
            m_TMAX = mean(TMAX),
            m_TMIN = mean(TMIN)) %>% 
  ungroup() %>% 
  mutate(sun = daylight_hours(pdate))

head(means)
## # A tibble: 6 × 5
##   pdate      m_PRCP m_TMAX m_TMIN   sun
##   <date>      <dbl>  <dbl>  <dbl> <dbl>
## 1 2028-01-01  0.207   43.6   30.7  8.58
## 2 2028-01-02  0.282   43.2   30.8  8.59
## 3 2028-01-03  0.224   43.5   30.4  8.61
## 4 2028-01-04  0.275   43.6   30.9  8.63
## 5 2028-01-05  0.289   43.9   31.1  8.65
## 6 2028-01-06  0.311   43.6   31.0  8.67

Graphs and summaries

Create scatterplots of each of our four variables agains pdate. Also Provide summmaries of each.

Solution

means %>% 
  ggplot(aes(x = pdate, y = m_TMAX)) + geom_point() +
  ggtitle("Plot of TMAX")

summary(means$m_TMAX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   43.21   49.36   59.55   60.58   70.91   80.36
means %>% 
  ggplot(aes(x = pdate, y = m_TMIN)) + geom_point(size = .2) +
  ggtitle("Plot of m_TMIN")

summary(means$m_TMIN)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30.19   33.24   38.24   39.84   46.36   51.14
means %>% 
  ggplot(aes(x = pdate, y = m_PRCP)) + geom_point(size = .2) + 
  ggtitle("Plot of m_prcp")

summary(means$m_PRCP)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.005059 0.054765 0.123155 0.136536 0.212024 0.404762
means %>% 
  ggplot(aes(x = pdate, y = sun)) + geom_point(size = .2) +
  ggtitle("Plot of sun")

summary(means$sun)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   8.499   9.789  12.259  12.231  14.681  15.902

Compute Standardized Values

The plots and summaries show that the variables we are using have very different values. In order to compute a misery index, we need to convert the variables into comparable values. One way to do this is to construct z scores. Subtract the mean and divide by the standard deviation.

Construct the dataframe z_scores from the dataframe means.

Solution

z_scores = means %>% 
  mutate(z_TMAX = (m_TMAX - mean(m_TMAX))/sd(m_TMAX),
         z_TMIN = (m_TMIN - mean(m_TMIN))/sd(m_TMIN),
         z_PRCP = (m_PRCP - mean(m_PRCP))/sd(m_PRCP),
         z_sun = (sun - mean(sun))/sd(sun)
         )

summary(z_scores)
##      pdate                m_PRCP             m_TMAX          m_TMIN     
##  Min.   :2028-01-01   Min.   :0.005059   Min.   :43.21   Min.   :30.19  
##  1st Qu.:2028-04-01   1st Qu.:0.054765   1st Qu.:49.36   1st Qu.:33.24  
##  Median :2028-07-01   Median :0.123155   Median :59.55   Median :38.24  
##  Mean   :2028-07-01   Mean   :0.136536   Mean   :60.58   Mean   :39.84  
##  3rd Qu.:2028-09-30   3rd Qu.:0.212024   3rd Qu.:70.91   3rd Qu.:46.36  
##  Max.   :2028-12-31   Max.   :0.404762   Max.   :80.36   Max.   :51.14  
##       sun             z_TMAX             z_TMIN            z_PRCP       
##  Min.   : 8.499   Min.   :-1.46781   Min.   :-1.4350   Min.   :-1.4088  
##  1st Qu.: 9.789   1st Qu.:-0.94850   1st Qu.:-0.9813   1st Qu.:-0.8762  
##  Median :12.259   Median :-0.08702   Median :-0.2373   Median :-0.1434  
##  Mean   :12.231   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.:14.681   3rd Qu.: 0.87346   3rd Qu.: 0.9699   3rd Qu.: 0.8089  
##  Max.   :15.902   Max.   : 1.67282   Max.   : 1.6807   Max.   : 2.8741  
##      z_sun         
##  Min.   :-1.46973  
##  1st Qu.:-0.96193  
##  Median : 0.01076  
##  Mean   : 0.00000  
##  3rd Qu.: 0.96491  
##  Max.   : 1.44553

Compute the Index

Now that we have comparable values we can construct the misery using signed addition.

Use TMAX for temperature.

Create the dataframe index conataining the variable misery.

Construct a scatterplot showing misery by day.

Solution

index = z_scores %>% 
  mutate(misery = z_PRCP - (z_TMAX + z_sun))

index %>% ggplot(aes(x = pdate,y=misery)) +
  geom_point(size = .2) +
  geom_smooth(color = "red") +
  ggtitle("Misery by Day")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The most miserable day.

Identify the single value of pdate where the misery index is > 5.

Solution

index %>% 
  filter(misery > 5)
## # A tibble: 1 × 10
##   pdate      m_PRCP m_TMAX m_TMIN   sun z_TMAX z_TMIN z_PRCP z_sun misery
##   <date>      <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
## 1 2028-11-23  0.405   48.4   34.9  9.04  -1.03 -0.732   2.87 -1.26   5.16

Least Miserable Day

Use arrange() and head().

Solution

least = index %>% 
  arrange(misery) 

head(least)
## # A tibble: 6 × 10
##   pdate       m_PRCP m_TMAX m_TMIN   sun z_TMAX z_TMIN z_PRCP z_sun misery
##   <date>       <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
## 1 2028-07-29 0.006     80.1   50.2  15.0   1.65   1.55  -1.40  1.09  -4.15
## 2 2028-07-27 0.0118    80.4   50.3  15.1   1.67   1.55  -1.34  1.13  -4.14
## 3 2028-07-25 0.00859   79.6   50    15.2   1.61   1.51  -1.37  1.16  -4.14
## 4 2028-07-19 0.00965   78.6   50.6  15.4   1.52   1.61  -1.36  1.24  -4.12
## 5 2028-07-30 0.00541   80.0   50.3  15.0   1.64   1.56  -1.41  1.08  -4.12
## 6 2028-07-21 0.014     79.4   50.6  15.3   1.59   1.61  -1.31  1.22  -4.12

Comment

There are numerous days in July with misery indices below -4. Of course, they might not be pleasant.

Misery by Month

Creat mo_misery holding the mean and standard deviation of misery for each month.

Look for the highest average misery and the highest standard deviation.

Do a scatterplot showing the mean value of misery for each month.

Solution

mo_misery = index %>% 
  mutate(mo = month(pdate)) %>% 
  group_by(mo) %>% 
  summarize(mean_mo = mean(misery),
            sd_mo = sd(misery))

mo_misery %>% 
  arrange(mean_mo)
## # A tibble: 12 × 3
##       mo mean_mo sd_mo
##    <dbl>   <dbl> <dbl>
##  1     7  -3.95  0.166
##  2     8  -3.29  0.488
##  3     6  -3.21  0.321
##  4     5  -2.34  0.380
##  5     9  -1.77  0.596
##  6     4  -0.701 0.552
##  7    10   0.727 1.06 
##  8     3   1.02  0.590
##  9     2   2.41  0.417
## 10    11   3.37  0.836
## 11     1   3.78  0.493
## 12    12   4.04  0.489
mo_misery %>% 
  arrange(desc(sd_mo))
## # A tibble: 12 × 3
##       mo mean_mo sd_mo
##    <dbl>   <dbl> <dbl>
##  1    10   0.727 1.06 
##  2    11   3.37  0.836
##  3     9  -1.77  0.596
##  4     3   1.02  0.590
##  5     4  -0.701 0.552
##  6     1   3.78  0.493
##  7    12   4.04  0.489
##  8     8  -3.29  0.488
##  9     2   2.41  0.417
## 10     5  -2.34  0.380
## 11     6  -3.21  0.321
## 12     7  -3.95  0.166
mo_misery %>% ggplot(aes(x = mo, y = mean_mo)) + geom_point()

Timing

Look at the pdates of maximal sun, m_TMAX, and m_TMIN. Also get the pdate for the minimum of m_PRCP. Comment.

Solution

index %>% 
  filter(sun == max(sun)) %>% 
  select(pdate,sun)
## # A tibble: 2 × 2
##   pdate        sun
##   <date>     <dbl>
## 1 2028-06-20  15.9
## 2 2028-06-21  15.9
index %>% 
  filter(m_TMAX == max(m_TMAX)) %>% 
  select(pdate,m_TMAX)
## # A tibble: 1 × 2
##   pdate      m_TMAX
##   <date>      <dbl>
## 1 2028-07-27   80.4
index %>% 
  filter(m_TMIN == max(m_TMIN)) %>% 
  select(pdate,m_TMIN)
## # A tibble: 1 × 2
##   pdate      m_TMIN
##   <date>      <dbl>
## 1 2028-07-20   51.1
index %>% 
  filter(m_PRCP == min(m_PRCP)) %>% 
  select(pdate,m_PRCP)
## # A tibble: 1 × 2
##   pdate       m_PRCP
##   <date>       <dbl>
## 1 2028-08-10 0.00506

Comment

The longest day is about a month before the peak temperature values and about six weeks before the lowest rain value.