Patterns 2

Harold Nelson

2/24/2022

Setup

Get the necessary packages and data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(readr)
load("oly_airport.Rdata")

Recreate cal24

cal24 = oly_airport %>% 
  mutate(DATE = make_date(2024,mo,dy),
         yr = year(DATE))

summary(cal24)
##    STATION              NAME                DATE                 PRCP       
##  Length:29472       Length:29472       Min.   :2024-01-01   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:2024-04-02   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :2024-07-02   Median :0.0000  
##                                        Mean   :2024-07-02   Mean   :0.1369  
##                                        3rd Qu.:2024-10-01   3rd Qu.:0.1400  
##                                        Max.   :2024-12-31   Max.   :4.8200  
##                                                                             
##       TMAX             TMIN             yr             mo       
##  Min.   : 18.00   Min.   :-8.00   Min.   :2024   8      : 2511  
##  1st Qu.: 50.00   1st Qu.:33.00   1st Qu.:2024   10     : 2511  
##  Median : 59.00   Median :40.00   Median :2024   7      : 2510  
##  Mean   : 60.55   Mean   :39.83   Mean   :2024   12     : 2506  
##  3rd Qu.: 71.00   3rd Qu.:47.00   3rd Qu.:2024   1      : 2504  
##  Max.   :110.00   Max.   :69.00   Max.   :2024   5      : 2494  
##                                                  (Other):14436  
##        dy       
##  1      :  969  
##  2      :  969  
##  4      :  969  
##  5      :  969  
##  10     :  969  
##  11     :  969  
##  (Other):23658

Variation in TMAX

Do a plot of the standard deviation of TMAX based on cal24.

Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(sd_TMAX = sd(TMAX)) %>% 
  ungroup() %>% 
  ggplot(aes(x = DATE, y = sd_TMAX)) +
  geom_point() +
  ggtitle("Standard Deviation of TMAX")

Difference between TMAX and TMIN

Let’s look at the annual pattern for the difference between TMAX and TMIN using cal24.

Solution

cal24 %>% 
  mutate(diff = TMAX - TMIN) %>% 
  group_by(DATE) %>% 
  summarize(diff = mean(diff)) %>% 
  ungroup() %>% 
  ggplot(aes(x = DATE, y = diff)) +
  geom_point() +
  ggtitle("Difference between TMAX and TMIN")

The difference is much larger during the warm months of the year.

Plotly

We can use plotly to make a plot interactive in two steps.

  1. Create a ggplot object instead of just displaying the plot. You can display the plot by referencing the named object.

  2. Use the named object in a call to ggplotly() to get an interactive graph.

Precipitation

There are two possible ways to look at precipitation. We could use either the mean value of precipitation for a date, or the probability of precipitation on that date.

Do the mean value of precipitation first. We’ll use plotly.

Solution

g1 = cal24 %>% 
  group_by(DATE) %>% 
  summarize(mean_precip = mean(PRCP)) %>% 
  ungroup() %>% 
  ggplot(aes(x = DATE, y = mean_precip)) +
  geom_point()

ggplotly(g1) # For Rpubs or other html
# g1         For word

Now do the probability of precipitation.

## Solution

cal24 %>% 
  group_by(DATE) %>% 
  summarize(prob_precip = mean(PRCP > 0)) %>% 
  ungroup() %>% 
  ggplot(aes(x = DATE, y = prob_precip)) +
  geom_point()

Observation: Based on these two graphs, there are obvious similarities between the two, but there is one notable difference. The heavy rainfall of November and December does not carry over to the following January and February.

Precipitation and TMAX

Create a graph showing loess curves for precipitation and TMAX. Since these two variables have such different values, you will have to create z-scores to make them visually compatible. Call the z-score variables n_TMAX and n_PRCP.

Solution

cal24 %>% 
  mutate(n_TMAX = (TMAX - mean(TMAX))/sd(TMAX),
         n_PRCP = (PRCP - mean(PRCP))/sd(PRCP)) %>% 
  ggplot(aes(x = DATE)) +
  geom_smooth(aes(y = n_TMAX), color = "red") +
  geom_smooth(aes(y = n_PRCP), color = "blue")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

The turning points in the summer are essentially the same. The peak in the precipitation curve to the right matches what we noted earlier in the graph of mean precipitation.