Sep 30 Notes

Harold Nelson

2025-10-05

Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Get Data

Download the GHCN data from Moodle and import it into a dataframe OAW2509. Use str() to verify.

Solution

OAW2509 <- read_csv("4132838.csv")
## Rows: 30809 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): STATION, NAME
## dbl  (3): PRCP, TMAX, TMIN
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(OAW2509)
## spc_tbl_ [30,809 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ STATION: chr [1:30809] "USW00024227" "USW00024227" "USW00024227" "USW00024227" ...
##  $ NAME   : chr [1:30809] "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" ...
##  $ DATE   : Date[1:30809], format: "1941-05-13" "1941-05-14" ...
##  $ PRCP   : num [1:30809] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
##  $ TMAX   : num [1:30809] 66 63 58 55 57 59 58 65 68 85 ...
##  $ TMIN   : num [1:30809] 50 47 44 45 46 39 40 50 42 46 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   STATION = col_character(),
##   ..   NAME = col_character(),
##   ..   DATE = col_date(format = ""),
##   ..   PRCP = col_double(),
##   ..   TMAX = col_double(),
##   ..   TMIN = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Skim

Install skimr and use it to examine the data.

Solution

library(skimr)
skimmed = skim(OAW2509)
# View(skimmed)
# View() only works in interactive mode.

When

did the missing values occur?

Solution

OAW2509 %>% 
  filter(is.na(PRCP) |
        is.na(TMAX) | 
        is.na(TMIN) 
        ) %>% 
  select(DATE)
## # A tibble: 18 × 1
##    DATE      
##    <date>    
##  1 1996-01-24
##  2 1996-07-03
##  3 1996-12-26
##  4 1996-12-27
##  5 1997-04-06
##  6 1997-04-07
##  7 1997-04-12
##  8 1997-04-13
##  9 1997-04-14
## 10 1997-05-07
## 11 1997-05-08
## 12 1997-05-09
## 13 1997-05-13
## 14 1997-05-14
## 15 2023-03-07
## 16 2023-03-08
## 17 2023-03-09
## 18 2025-02-25

Eliminate

days with missing data. Also drop STATION and NAME.

Solution

OAW2509 = OAW2509 %>% 
  drop_na() %>% 
  select(-c(STATION,NAME))

str(OAW2509)
## tibble [30,791 × 4] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:30791], format: "1941-05-13" "1941-05-14" ...
##  $ PRCP: num [1:30791] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
##  $ TMAX: num [1:30791] 66 63 58 55 57 59 58 65 68 85 ...
##  $ TMIN: num [1:30791] 50 47 44 45 46 39 40 50 42 46 ...

Add

calendar variables mo, dy, and yr using the lubridate functions month, day, and year.

Solution

OAW2509 = OAW2509 %>% 
  mutate(mo = month(DATE),
         dy = day(DATE),
         yr = year(DATE))

It

rains every day in Olympia. What’s the truth.

Solution

OAW2509 %>% 
  summarize(mean(PRCP > 0))
## # A tibble: 1 × 1
##   `mean(PRCP > 0)`
##              <dbl>
## 1            0.445

Compare Months

Compute the fraction of rainy days in each month. Arrange according to this value.

OAW2509 %>% 
  group_by(mo) %>% 
  summarize(rainy = mean(PRCP > 0)) %>%
  ungroup() %>% 
  arrange(rainy)
## # A tibble: 12 × 2
##       mo rainy
##    <dbl> <dbl>
##  1     7 0.139
##  2     8 0.173
##  3     9 0.279
##  4     6 0.294
##  5     5 0.362
##  6    10 0.461
##  7     4 0.509
##  8     3 0.590
##  9     2 0.604
## 10     1 0.638
## 11    11 0.644
## 12    12 0.666

Trend

Is it getting less rainy recently?

Solution - Visual

year_rainy = OAW2509 %>% 
  filter(yr < 2025 & yr > 1941) %>% 
  group_by(yr) %>% 
  mutate(rainy = mean(PRCP >0)) %>% 
  ungroup()

year_rainy %>% 
  ggplot(aes(x = yr, y = rainy)) +
  geom_point()

Solution - Numerical

cor(year_rainy$yr,year_rainy$rainy)
## [1] 0.02554041

Some

months getting dryer? Recent Seattle Times article.

Do a facetted graph showing an lm smoother for each month.

Solution

mo_yr_rainy = OAW2509 %>% 
  group_by(mo,yr) %>% 
  summarize(rainy = mean(PRCP > 0)) %>% 
  ungroup()
## `summarise()` has grouped output by 'mo'. You can override using the `.groups`
## argument.
mo_yr_rainy %>% 
  ggplot(aes(x = yr, y = rainy)) +
  geom_smooth(method = 'lm') +
  facet_wrap(~mo,scales = "free_y") 
## `geom_smooth()` using formula = 'y ~ x'

Look at June and July Separately.

Solution

mo_yr_rainy %>% 
  filter(mo == 6) %>% 
  ggplot(aes(x = yr, y= rainy)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("June")
## `geom_smooth()` using formula = 'y ~ x'

mo_yr_rainy %>% 
  filter(mo == 7) %>% 
  ggplot(aes(x = yr, y= rainy)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("July")
## `geom_smooth()` using formula = 'y ~ x'