Weather 2

Harold Nelson

2025-10-07

Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggridges)

Load the Data

Also run glimpse()

load("OAW2509.Rdata")
glimpse(OAW2509)
## Rows: 30,791
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo   <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy   <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr   <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…

Structure

Identifiers

Measures

Analysis Plan

Examine each measure for distribution, trend, and seasonality.

TMAX

Examine the density of TMAX.

Solution

OAW2509 %>% 
  ggplot(aes(x = TMAX)) +
  geom_density()

Use ggplotly

Identify the primary and secondary peaks

Solution

g = OAW2509 %>% 
  ggplot(aes(x = TMAX)) +
  geom_density()

ggplotly(g)

Summary of TMAX

Solution

summary(OAW2509$TMAX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   50.00   59.00   60.66   71.00  110.00

Trend

Create a linear model using the value of yr to predict TMAX. Is the coefficient of date significant.

Solution

lm_tmax = lm(TMAX~yr,data = OAW2509)
summary(lm_tmax)
## 
## Call:
## lm(formula = TMAX ~ yr, data = OAW2509)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.149 -10.625  -1.364  10.114  48.761 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.211589   6.373700   4.740 2.15e-06 ***
## yr           0.015353   0.003214   4.777 1.79e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.73 on 30789 degrees of freedom
## Multiple R-squared:  0.0007406,  Adjusted R-squared:  0.0007082 
## F-statistic: 22.82 on 1 and 30789 DF,  p-value: 1.787e-06

Interpretation?

What is the rate of change of TMAX per century?

Solution

Over 100 years, TMAX increases by 1.5 degrees.

Graphics

Look at the model as a smoother in a scaterplot of yr and TMAX.

Solution

OAW2509 %>% 
  ggplot(aes(x = yr, y = TMAX)) +
  geom_point(color = "green",
             size = .1,
             alpha = .1) +
  geom_smooth(color = "red",method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

It’s hard to see the slope.

Seasonality of TMAX

Categorical Regression

Build a regression model using the factor version of mo. Why the factor version?

Solution

TMAX_mo = lm(TMAX~factor(mo),data = OAW2509)
summary(TMAX_mo)
## 
## Call:
## lm(formula = TMAX ~ factor(mo), data = OAW2509)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.932  -5.019  -0.506   4.433  38.922 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  44.955436   0.142666 315.109   <2e-16 ***
## factor(mo)2   3.976267   0.206614  19.245   <2e-16 ***
## factor(mo)3   8.434798   0.201799  41.798   <2e-16 ***
## factor(mo)4  14.064047   0.203518  69.105   <2e-16 ***
## factor(mo)5  21.103769   0.201471 104.748   <2e-16 ***
## factor(mo)6  26.122211   0.202806 128.804   <2e-16 ***
## factor(mo)7  32.678201   0.201166 162.444   <2e-16 ***
## factor(mo)8  32.550826   0.201147 161.826   <2e-16 ***
## factor(mo)9  26.781261   0.203068 131.883   <2e-16 ***
## factor(mo)10 15.705470   0.201741  77.850   <2e-16 ***
## factor(mo)11  5.612024   0.203415  27.589   <2e-16 ***
## factor(mo)12  0.006901   0.201780   0.034    0.973    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.279 on 30779 degrees of freedom
## Multiple R-squared:  0.7194, Adjusted R-squared:  0.7193 
## F-statistic:  7175 on 11 and 30779 DF,  p-value: < 2.2e-16

Interpretation: The omitted value of mo is 1.

The mean value of TMAX in January is given by the intercept term.

The coefficients of the other months are the differences of the mean values for the other months from the January value.

Graphic 1

Use facet_wrap to see density curves of TMAX for each month.

Solution

OAW2509 %>% 
  ggplot(aes(x = TMAX)) +
  geom_density() +
  facet_wrap(~mo) +
  ggtitle("Demsity of TMAX by mo")

Graphic 2

Look at the documentation for ggridgeline then produce a ridgeline Plot. Put TMAX on the horizontal axis and mo on the vertical.

Solution

ggplot(OAW2509, aes(x = TMAX, y = factor(mo))) + 
  geom_density_ridges()
## Picking joint bandwidth of 1.3

Graphic 3

Create a scatterplot.

Use calendar date (cdate) on the horizontal axis and TMAX on the vertical.

Create the variable cdate using the lubridate function make_date(). Use the actual values of mo and dy, but replace the value of yr with 2028.

There is a lot of data, so much that a default geom_point() will produce an inkblot. Use small values of size and alpha. Provide a smoother with color = “red”.

Solution

OAW2509 %>% 
  mutate(cdate = make_date(year = 2028,month = mo, day = dy)) %>% 
  ggplot(aes(x = cdate, y = TMAX)) +
  geom_point(size = .1, alpha = .1,color = "gray") +
  geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'