Harold Nelson
2025-10-16
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
I’ll define a day as pleasant if there is no rain and the maximum temperatur is between 60 and 80.
Add a boolean variable pleasant to OAW2509.
Compute the proportion of pleasant days in the entire dataframe.
OAW2509 = OAW2509 %>%
mutate(pleasant = PRCP == 0 &
TMAX >= 60 &
TMAX <= 80)
OAW2509 %>%
summarize(prop_pleasant = mean(pleasant))## # A tibble: 1 × 1
## prop_pleasant
## <dbl>
## 1 0.294
Create a summary file showing the proportion of pleasant days in each calendar month.
Use this to create an interactive scatterplot showing the values.
Pleasantness rises from January to June.
It drops slightly for July and August.
The most pleasant month of the entire year is September with June a close second.
I’ll define misery as a cold, wet, dark day.
We can measure coldness and wetness from the data in OAW2509.
For darkness, we need the number of daylight hours. I posed the following question to ChatGPT.
You are an R expert. I want a function to give the hours of daylight for a given date in Olympia, WA.
There was an error in the code supplied by ChatGPT. It included an extra parameter, keep, in the call to getSunlightTimes.
# Need to install suncalc
library(suncalc)
daylight_hours <- function(date) {
# Ensure input is a Date object
date <- as.Date(date)
# Coordinates for Olympia, WA
lat <- 47.04
lon <- -122.90
# Get sunrise and sunset times in UTC and convert to local time
sun_times <- getSunlightTimes(date = date, lat = lat, lon = lon, tz = "America/Los_Angeles")
# Calculate daylight duration in hours
daylight_duration <- as.numeric(difftime(sun_times$sunset, sun_times$sunrise, units = "hours"))
return(daylight_duration)
}Construct a dataframe means as follows:
Create the variable pdate using make_date to set all dates to 2028.
group by pdate.
Calculate the means of TMAX, TMIN, and PRCP as m_…
Add the variable sun for each pdate using the function getSunlightTimes.
means = OAW2509 %>%
mutate(pdate = make_date(year = 2028,
month = mo,
day = dy)) %>%
group_by(pdate) %>%
summarize(m_PRCP = mean(PRCP),
m_TMAX = mean(TMAX),
m_TMIN = mean(TMIN)) %>%
ungroup() %>%
mutate(sun = daylight_hours(pdate))
head(means)## # A tibble: 6 × 5
## pdate m_PRCP m_TMAX m_TMIN sun
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2028-01-01 0.207 43.6 30.7 8.58
## 2 2028-01-02 0.282 43.2 30.8 8.59
## 3 2028-01-03 0.224 43.5 30.4 8.61
## 4 2028-01-04 0.275 43.6 30.9 8.63
## 5 2028-01-05 0.289 43.9 31.1 8.65
## 6 2028-01-06 0.311 43.6 31.0 8.67
Create scatterplots of each of our four variables agains pdate. Also Provide summmaries of each.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 43.21 49.36 59.55 60.58 70.91 80.36
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.19 33.24 38.24 39.84 46.36 51.14
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.005059 0.054765 0.123155 0.136536 0.212024 0.404762
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.499 9.789 12.259 12.231 14.681 15.902
The plots and summaries show that the variables we are using have very different values. In order to compute a misery index, we need to convert the variables into comparable values. One way to do this is to construct z scores. Subtract the mean and divide by the standard deviation.
Construct the dataframe z_scores from the dataframe means.
z_scores = means %>%
mutate(z_TMAX = (m_TMAX - mean(m_TMAX))/sd(m_TMAX),
z_TMIN = (m_TMIN - mean(m_TMIN))/sd(m_TMIN),
z_PRCP = (m_PRCP - mean(m_PRCP))/sd(m_PRCP),
z_sun = (sun - mean(sun))/sd(sun)
)
summary(z_scores)## pdate m_PRCP m_TMAX m_TMIN
## Min. :2028-01-01 Min. :0.005059 Min. :43.21 Min. :30.19
## 1st Qu.:2028-04-01 1st Qu.:0.054765 1st Qu.:49.36 1st Qu.:33.24
## Median :2028-07-01 Median :0.123155 Median :59.55 Median :38.24
## Mean :2028-07-01 Mean :0.136536 Mean :60.58 Mean :39.84
## 3rd Qu.:2028-09-30 3rd Qu.:0.212024 3rd Qu.:70.91 3rd Qu.:46.36
## Max. :2028-12-31 Max. :0.404762 Max. :80.36 Max. :51.14
## sun z_TMAX z_TMIN z_PRCP
## Min. : 8.499 Min. :-1.46781 Min. :-1.4350 Min. :-1.4088
## 1st Qu.: 9.789 1st Qu.:-0.94850 1st Qu.:-0.9813 1st Qu.:-0.8762
## Median :12.259 Median :-0.08702 Median :-0.2373 Median :-0.1434
## Mean :12.231 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:14.681 3rd Qu.: 0.87346 3rd Qu.: 0.9699 3rd Qu.: 0.8089
## Max. :15.902 Max. : 1.67282 Max. : 1.6807 Max. : 2.8741
## z_sun
## Min. :-1.46973
## 1st Qu.:-0.96193
## Median : 0.01076
## Mean : 0.00000
## 3rd Qu.: 0.96491
## Max. : 1.44553
Now that we have comparable values we can construct the misery using signed addition.
Use TMAX for temperature.
Create the dataframe index conataining the variable misery.
Construct a scatterplot showing misery by day.
index = z_scores %>%
mutate(misery = z_PRCP - (z_TMAX + z_sun))
index %>% ggplot(aes(x = pdate,y=misery)) +
geom_point(size = .2) +
geom_smooth(color = "red") +
ggtitle("Misery by Day")## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Identify the single value of pdate where the misery index is > 5.
## # A tibble: 1 × 10
## pdate m_PRCP m_TMAX m_TMIN sun z_TMAX z_TMIN z_PRCP z_sun misery
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2028-11-23 0.405 48.4 34.9 9.04 -1.03 -0.732 2.87 -1.26 5.16
Use arrange() and head().
## # A tibble: 6 × 10
## pdate m_PRCP m_TMAX m_TMIN sun z_TMAX z_TMIN z_PRCP z_sun misery
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2028-07-29 0.006 80.1 50.2 15.0 1.65 1.55 -1.40 1.09 -4.15
## 2 2028-07-27 0.0118 80.4 50.3 15.1 1.67 1.55 -1.34 1.13 -4.14
## 3 2028-07-25 0.00859 79.6 50 15.2 1.61 1.51 -1.37 1.16 -4.14
## 4 2028-07-19 0.00965 78.6 50.6 15.4 1.52 1.61 -1.36 1.24 -4.12
## 5 2028-07-30 0.00541 80.0 50.3 15.0 1.64 1.56 -1.41 1.08 -4.12
## 6 2028-07-21 0.014 79.4 50.6 15.3 1.59 1.61 -1.31 1.22 -4.12
Creat mo_misery holding the mean and standard deviation of misery for each month.
Look for the highest average misery and the highest standard deviation.
Do a scatterplot showing the mean value of misery for each month.
mo_misery = index %>%
mutate(mo = month(pdate)) %>%
group_by(mo) %>%
summarize(mean_mo = mean(misery),
sd_mo = sd(misery))
mo_misery %>%
arrange(mean_mo)## # A tibble: 12 × 3
## mo mean_mo sd_mo
## <dbl> <dbl> <dbl>
## 1 7 -3.95 0.166
## 2 8 -3.29 0.488
## 3 6 -3.21 0.321
## 4 5 -2.34 0.380
## 5 9 -1.77 0.596
## 6 4 -0.701 0.552
## 7 10 0.727 1.06
## 8 3 1.02 0.590
## 9 2 2.41 0.417
## 10 11 3.37 0.836
## 11 1 3.78 0.493
## 12 12 4.04 0.489
## # A tibble: 12 × 3
## mo mean_mo sd_mo
## <dbl> <dbl> <dbl>
## 1 10 0.727 1.06
## 2 11 3.37 0.836
## 3 9 -1.77 0.596
## 4 3 1.02 0.590
## 5 4 -0.701 0.552
## 6 1 3.78 0.493
## 7 12 4.04 0.489
## 8 8 -3.29 0.488
## 9 2 2.41 0.417
## 10 5 -2.34 0.380
## 11 6 -3.21 0.321
## 12 7 -3.95 0.166
Look at the pdates of maximal sun, m_TMAX, and m_TMIN. Also get the pdate for the minimum of m_PRCP. Comment.
## # A tibble: 2 × 2
## pdate sun
## <date> <dbl>
## 1 2028-06-20 15.9
## 2 2028-06-21 15.9
## # A tibble: 1 × 2
## pdate m_TMAX
## <date> <dbl>
## 1 2028-07-27 80.4
## # A tibble: 1 × 2
## pdate m_TMIN
## <date> <dbl>
## 1 2028-07-20 51.1
## # A tibble: 1 × 2
## pdate m_PRCP
## <date> <dbl>
## 1 2028-08-10 0.00506
The longest day is about a month before the peak temperature values and about six weeks before the lowest rain value.
Comment
There are numerous days in July with misery indices below -4. Of course, they might not be pleasant.