Patterns in Olympia Weather 1

Harold Nelson

2/23/2021

Setup

Get the necessary packages and data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(readr)
load("oly_airport.Rdata")

How’s the Weather?

This little presentation looks at patterns in the weather for the Olympia area. It is based on data from the Olympia Airport beginning in 1941. About a dozen observations were removed because one or more variables had missing values.

Look at the data and see what we have.

str(oly_airport)
## tibble [29,472 × 9] (S3: tbl_df/tbl/data.frame)
##  $ STATION: chr [1:29472] "USW00024227" "USW00024227" "USW00024227" "USW00024227" ...
##  $ NAME   : chr [1:29472] "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" ...
##  $ DATE   : Date[1:29472], format: "1941-05-13" "1941-05-14" ...
##  $ PRCP   : num [1:29472] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
##  $ TMAX   : num [1:29472] 66 63 58 55 57 59 58 65 68 85 ...
##  $ TMIN   : num [1:29472] 50 47 44 45 46 39 40 50 42 46 ...
##  $ yr     : Factor w/ 82 levels "1941","1942",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ mo     : Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ dy     : Factor w/ 31 levels "1","2","3","4",..: 13 14 15 16 17 18 19 20 21 22 ...
summary(oly_airport)
##    STATION              NAME                DATE                 PRCP       
##  Length:29472       Length:29472       Min.   :1941-05-13   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:1961-07-14   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :1981-09-15   Median :0.0000  
##                                        Mean   :1981-09-19   Mean   :0.1369  
##                                        3rd Qu.:2001-12-01   3rd Qu.:0.1400  
##                                        Max.   :2022-02-11   Max.   :4.8200  
##                                                                             
##       TMAX             TMIN             yr              mo       
##  Min.   : 18.00   Min.   :-8.00   1944   :  366   8      : 2511  
##  1st Qu.: 50.00   1st Qu.:33.00   1948   :  366   10     : 2511  
##  Median : 59.00   Median :40.00   1952   :  366   7      : 2510  
##  Mean   : 60.55   Mean   :39.83   1956   :  366   12     : 2506  
##  3rd Qu.: 71.00   3rd Qu.:47.00   1960   :  366   1      : 2504  
##  Max.   :110.00   Max.   :69.00   1964   :  366   5      : 2494  
##                                   (Other):27276   (Other):14436  
##        dy       
##  1      :  969  
##  2      :  969  
##  4      :  969  
##  5      :  969  
##  10     :  969  
##  11     :  969  
##  (Other):23658

Prediction

To summarize the annual pattern, we need to put all of the data into a single year. You could think of this as making a prediction for an arbitrary future year. We need to do this so we can have a date variable that doesn’t contain different years. We’ll use 2024 since it’s a leap year. We’ll call this new dataframe cal24. Use make_date() from the lubridate package. Run summary() on cal24 to verify that you succeeded.

Solution

cal24 = oly_airport %>% 
  mutate(DATE = make_date(2024,mo,dy),
         yr = year(DATE))

summary(cal24)
##    STATION              NAME                DATE                 PRCP       
##  Length:29472       Length:29472       Min.   :2024-01-01   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:2024-04-02   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :2024-07-02   Median :0.0000  
##                                        Mean   :2024-07-02   Mean   :0.1369  
##                                        3rd Qu.:2024-10-01   3rd Qu.:0.1400  
##                                        Max.   :2024-12-31   Max.   :4.8200  
##                                                                             
##       TMAX             TMIN             yr             mo       
##  Min.   : 18.00   Min.   :-8.00   Min.   :2024   8      : 2511  
##  1st Qu.: 50.00   1st Qu.:33.00   1st Qu.:2024   10     : 2511  
##  Median : 59.00   Median :40.00   Median :2024   7      : 2510  
##  Mean   : 60.55   Mean   :39.83   Mean   :2024   12     : 2506  
##  3rd Qu.: 71.00   3rd Qu.:47.00   3rd Qu.:2024   1      : 2504  
##  Max.   :110.00   Max.   :69.00   Max.   :2024   5      : 2494  
##                                                  (Other):14436  
##        dy       
##  1      :  969  
##  2      :  969  
##  4      :  969  
##  5      :  969  
##  10     :  969  
##  11     :  969  
##  (Other):23658

Graphs of TMAX

Create a scatterplot showing all of the values of TMAX for each value of DATE. Set size and alpha to .1 in the call to geom_point() to deal with the overplotting.

Solution

cal24 %>% 
  ggplot(aes(x = DATE, y = TMAX)) +
  geom_point(size = .2, alpha = .2)

## Smoothing

Add geom_smooth() to the previous graph.

Solution

cal24 %>% 
  ggplot(aes(x = DATE, y = TMAX)) +
  geom_point(size = .2, alpha = .2) +
  geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Question

Which takes longer, the rise or the fall?

Answer

It seems to take longer going up.

2021

Add a geom_point() with red points to show calendar 2021. Create a dataframe cal21 by filtering for yr = 2021. Use make_date() to change the yr to 2024. Add a second geom_point() with this dataframe as the data argument. Set the size parameter to .5 in the second geom_point().

Solution

cal21 = oly_airport %>% 
  filter(yr == 2021) %>% 
  mutate(DATE = make_date(2024,mo,dy))

cal24 %>% 
  ggplot(aes(x = DATE, y = TMAX)) +
  geom_point(size = .2, alpha = .2) +
  geom_point(data = cal21,aes(x = DATE, y = TMAX),
             color = "red",size=.5)

Mean TMAX

For each value of DATE compute a mean of TMAX and display the results in a graph.

Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(MTMAX = mean(TMAX)) %>% 
  ggplot(aes(x = DATE,y = MTMAX)) +
  geom_point(size = .5)

Density of Mean TMAX

Do a density plot of the mean values of TMAX.

Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(MTMAX = mean(TMAX)) %>% 
  ggplot(aes(x = MTMAX)) +
  geom_density(adjust = .2)

Graphs of TMIN

Create a scatterplot showing all of the values of TMIN for each value of DATE. Set size and alpha to .1 in the call to geom_point() to deal with the overplotting.

Solution

cal24 %>% 
  ggplot(aes(x = DATE, y = TMIN)) +
  geom_point(size = .2, alpha = .2)

## Smoothing

Add geom_smooth() to the previous graph.

Solution

cal24 %>% 
  ggplot(aes(x = DATE, y = TMIN)) +
  geom_point(size = .2, alpha = .2) +
  geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

2021

Add a geom_point() with red points to show calendar 2021. Create a dataframe cal21 by filtering for yr = 2021. Use make_date() to change the yr to 2024. Add a second geom_point() with this dataframe as the data argument. Set the size parameter to .5 in the second geom_point().

Solution

cal21 = oly_airport %>% 
  filter(yr == 2021) %>% 
  mutate(DATE = make_date(2024,mo,dy))

cal24 %>% 
  ggplot(aes(x = DATE, y = TMIN)) +
  geom_point(size = .2, alpha = .2) +
  geom_point(data = cal21,aes(x = DATE, y = TMIN),
             color = "red",size=.5)

Mean TMIN

For each value of DATE compute a mean of TMIN and display the results in a graph.

Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(MTMIN = mean(TMIN)) %>% 
  ggplot(aes(x = DATE,y = MTMIN)) +
  geom_point(size = .5)

Density of Mean TMIN

Do a density plot of the mean values of TMIN.

Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(MTMIN = mean(TMIN)) %>% 
  ggplot(aes(x = MTMIN)) +
  geom_density(adjust = .2)