Harold Nelson
2025-10-05
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Download the GHCN data from Moodle and import it into a dataframe OAW2509. Use str() to verify.
## Rows: 30809 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (3): PRCP, TMAX, TMIN
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [30,809 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ STATION: chr [1:30809] "USW00024227" "USW00024227" "USW00024227" "USW00024227" ...
## $ NAME : chr [1:30809] "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" ...
## $ DATE : Date[1:30809], format: "1941-05-13" "1941-05-14" ...
## $ PRCP : num [1:30809] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
## $ TMAX : num [1:30809] 66 63 58 55 57 59 58 65 68 85 ...
## $ TMIN : num [1:30809] 50 47 44 45 46 39 40 50 42 46 ...
## - attr(*, "spec")=
## .. cols(
## .. STATION = col_character(),
## .. NAME = col_character(),
## .. DATE = col_date(format = ""),
## .. PRCP = col_double(),
## .. TMAX = col_double(),
## .. TMIN = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
Install skimr and use it to examine the data.
did the missing values occur?
## # A tibble: 18 × 1
## DATE
## <date>
## 1 1996-01-24
## 2 1996-07-03
## 3 1996-12-26
## 4 1996-12-27
## 5 1997-04-06
## 6 1997-04-07
## 7 1997-04-12
## 8 1997-04-13
## 9 1997-04-14
## 10 1997-05-07
## 11 1997-05-08
## 12 1997-05-09
## 13 1997-05-13
## 14 1997-05-14
## 15 2023-03-07
## 16 2023-03-08
## 17 2023-03-09
## 18 2025-02-25
days with missing data. Also drop STATION and NAME.
## tibble [30,791 × 4] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:30791], format: "1941-05-13" "1941-05-14" ...
## $ PRCP: num [1:30791] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
## $ TMAX: num [1:30791] 66 63 58 55 57 59 58 65 68 85 ...
## $ TMIN: num [1:30791] 50 47 44 45 46 39 40 50 42 46 ...
calendar variables mo, dy, and yr using the lubridate functions month, day, and year.
rains every day in Olympia. What’s the truth.
## # A tibble: 1 × 1
## `mean(PRCP > 0)`
## <dbl>
## 1 0.445
Compute the fraction of rainy days in each month. Arrange according to this value.
## # A tibble: 12 × 2
## mo rainy
## <dbl> <dbl>
## 1 7 0.139
## 2 8 0.173
## 3 9 0.279
## 4 6 0.294
## 5 5 0.362
## 6 10 0.461
## 7 4 0.509
## 8 3 0.590
## 9 2 0.604
## 10 1 0.638
## 11 11 0.644
## 12 12 0.666
Is it getting less rainy recently?
year_rainy = OAW2509 %>%
filter(yr < 2025 & yr > 1941) %>%
group_by(yr) %>%
mutate(rainy = mean(PRCP >0)) %>%
ungroup()
year_rainy %>%
ggplot(aes(x = yr, y = rainy)) +
geom_point()months getting dryer? Recent Seattle Times article.
Do a facetted graph showing an lm smoother for each month.
## `summarise()` has grouped output by 'mo'. You can override using the `.groups`
## argument.
mo_yr_rainy %>%
ggplot(aes(x = yr, y = rainy)) +
geom_smooth(method = 'lm') +
facet_wrap(~mo,scales = "free_y") ## `geom_smooth()` using formula = 'y ~ x'
mo_yr_rainy %>%
filter(mo == 6) %>%
ggplot(aes(x = yr, y= rainy)) +
geom_point() +
geom_smooth(method = "lm") +
ggtitle("June")## `geom_smooth()` using formula = 'y ~ x'
mo_yr_rainy %>%
filter(mo == 7) %>%
ggplot(aes(x = yr, y= rainy)) +
geom_point() +
geom_smooth(method = "lm") +
ggtitle("July")## `geom_smooth()` using formula = 'y ~ x'