Harold Nelson
2025-10-07
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
Also run glimpse()
## Rows: 30,791
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…
Identifiers
Measures
Examine each measure for distribution, trend, and seasonality.
Identify the primary and secondary peaks
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 50.00 59.00 60.66 71.00 110.00
Create a linear model using the value of yr to predict TMAX. Is the coefficient of date significant.
##
## Call:
## lm(formula = TMAX ~ yr, data = OAW2509)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.149 -10.625 -1.364 10.114 48.761
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.211589 6.373700 4.740 2.15e-06 ***
## yr 0.015353 0.003214 4.777 1.79e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.73 on 30789 degrees of freedom
## Multiple R-squared: 0.0007406, Adjusted R-squared: 0.0007082
## F-statistic: 22.82 on 1 and 30789 DF, p-value: 1.787e-06
What is the rate of change of TMAX per century?
Over 100 years, TMAX increases by 1.5 degrees.
Look at the model as a smoother in a scaterplot of yr and TMAX.
OAW2509 %>%
ggplot(aes(x = yr, y = TMAX)) +
geom_point(color = "green",
size = .1,
alpha = .1) +
geom_smooth(color = "red",method = "lm")## `geom_smooth()` using formula = 'y ~ x'
It’s hard to see the slope.
Build a regression model using the factor version of mo. Why the factor version?
##
## Call:
## lm(formula = TMAX ~ factor(mo), data = OAW2509)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.932 -5.019 -0.506 4.433 38.922
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.955436 0.142666 315.109 <2e-16 ***
## factor(mo)2 3.976267 0.206614 19.245 <2e-16 ***
## factor(mo)3 8.434798 0.201799 41.798 <2e-16 ***
## factor(mo)4 14.064047 0.203518 69.105 <2e-16 ***
## factor(mo)5 21.103769 0.201471 104.748 <2e-16 ***
## factor(mo)6 26.122211 0.202806 128.804 <2e-16 ***
## factor(mo)7 32.678201 0.201166 162.444 <2e-16 ***
## factor(mo)8 32.550826 0.201147 161.826 <2e-16 ***
## factor(mo)9 26.781261 0.203068 131.883 <2e-16 ***
## factor(mo)10 15.705470 0.201741 77.850 <2e-16 ***
## factor(mo)11 5.612024 0.203415 27.589 <2e-16 ***
## factor(mo)12 0.006901 0.201780 0.034 0.973
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.279 on 30779 degrees of freedom
## Multiple R-squared: 0.7194, Adjusted R-squared: 0.7193
## F-statistic: 7175 on 11 and 30779 DF, p-value: < 2.2e-16
Interpretation: The omitted value of mo is 1.
The mean value of TMAX in January is given by the intercept term.
The coefficients of the other months are the differences of the mean values for the other months from the January value.
Use facet_wrap to see density curves of TMAX for each month.
OAW2509 %>%
ggplot(aes(x = TMAX)) +
geom_density() +
facet_wrap(~mo) +
ggtitle("Demsity of TMAX by mo")Look at the documentation for ggridgeline then produce a ridgeline Plot. Put TMAX on the horizontal axis and mo on the vertical.
## Picking joint bandwidth of 1.3
Create a scatterplot.
Use calendar date (cdate) on the horizontal axis and TMAX on the vertical.
Create the variable cdate using the lubridate function make_date(). Use the actual values of mo and dy, but replace the value of yr with 2028.
There is a lot of data, so much that a default geom_point() will produce an inkblot. Use small values of size and alpha. Provide a smoother with color = “red”.