Harold Nelson
2025-12-02
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Use the OAW2309 dataframe Create a linear regression model to predict the mean value of TMAX from the calendar month. Call this TMAX_mo_T.
##
## Call:
## lm(formula = TMAX ~ mo, data = OAW2309)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.940 -5.033 -0.530 4.470 38.967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.90791 0.14427 311.284 <2e-16 ***
## mo2 4.03164 0.20892 19.297 <2e-16 ***
## mo3 8.44918 0.20404 41.409 <2e-16 ***
## mo4 14.05624 0.20580 68.300 <2e-16 ***
## mo5 21.14569 0.20372 103.796 <2e-16 ***
## mo6 26.12542 0.20507 127.400 <2e-16 ***
## mo7 32.62203 0.20341 160.377 <2e-16 ***
## mo8 32.56430 0.20339 160.109 <2e-16 ***
## mo9 26.78366 0.20507 130.610 <2e-16 ***
## mo10 15.71011 0.20400 77.009 <2e-16 ***
## mo11 5.62339 0.20570 27.338 <2e-16 ***
## mo12 -0.06303 0.20404 -0.309 0.757
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.272 on 30063 degrees of freedom
## Multiple R-squared: 0.7198, Adjusted R-squared: 0.7197
## F-statistic: 7019 on 11 and 30063 DF, p-value: < 2.2e-16
Use get_regression_table(). Compare with summary().
## # A tibble: 12 × 7
## term estimate std_error statistic p_value lower_ci upper_ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 intercept 44.9 0.144 311. 0 44.6 45.2
## 2 mo: 2 4.03 0.209 19.3 0 3.62 4.44
## 3 mo: 3 8.45 0.204 41.4 0 8.05 8.85
## 4 mo: 4 14.1 0.206 68.3 0 13.7 14.5
## 5 mo: 5 21.1 0.204 104. 0 20.7 21.5
## 6 mo: 6 26.1 0.205 127. 0 25.7 26.5
## 7 mo: 7 32.6 0.203 160. 0 32.2 33.0
## 8 mo: 8 32.6 0.203 160. 0 32.2 33.0
## 9 mo: 9 26.8 0.205 131. 0 26.4 27.2
## 10 mo: 10 15.7 0.204 77.0 0 15.3 16.1
## 11 mo: 11 5.62 0.206 27.3 0 5.22 6.03
## 12 mo: 12 -0.063 0.204 -0.309 0.757 -0.463 0.337
## Rows: 30,075
## Columns: 5
## $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 5…
## $ mo <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6…
## $ TMAX_hat <dbl> 66.054, 66.054, 66.054, 66.054, 66.054, 66.054, 66.054, 66.05…
## $ residual <dbl> -0.054, -3.054, -8.054, -11.054, -9.054, -7.054, -8.054, -1.0…
Get a density plot of the residuals. Does it look like a normal distribution?
Compare the fitted values with the actuals. Put mo on the vertical axis. Put the fitted value and the actual values on the horizontal axis. Make the fitted value large and red. Use geom_jitter() with a size of .05 for the actuals.
fitted_and_residuals_T %>%
ggplot(aes(x = TMAX_hat,y = mo)) +
geom_point(color = "red",size = 2) +
geom_jitter(aes(x = TMAX),size = .05,alpha = .1) Use facet_wrap(~mo) to separate the months. Plot a density for the actuals and a vline for the fitted.
fitted_and_residuals_T %>%
ggplot() +
geom_density(aes(x = TMAX)) +
geom_vline(aes(xintercept = TMAX_hat),color = "red") +
facet_wrap(~mo)First get observed_fit.
## # A tibble: 12 × 2
## term estimate
## <chr> <dbl>
## 1 intercept 44.9
## 2 mo2 4.03
## 3 mo3 8.45
## 4 mo4 14.1
## 5 mo5 21.1
## 6 mo6 26.1
## 7 mo7 32.6
## 8 mo8 32.6
## 9 mo9 26.8
## 10 mo10 15.7
## 11 mo11 5.62
## 12 mo12 -0.0630
Note that the coeffcients are identical to the ones from the theoretical model.
Get the bootstrap distribution of the model coefficients,
boot_distribution_model <- OAW2309 |>
specify(
TMAX ~ mo
) |>
generate(reps = 1000, type = "bootstrap") |>
fit()
boot_distribution_model## # A tibble: 12,000 × 3
## # Groups: replicate [1,000]
## replicate term estimate
## <int> <chr> <dbl>
## 1 1 intercept 44.8
## 2 1 mo2 3.91
## 3 1 mo3 8.50
## 4 1 mo4 14.2
## 5 1 mo5 21.5
## 6 1 mo6 26.0
## 7 1 mo7 32.8
## 8 1 mo8 32.9
## 9 1 mo9 27.1
## 10 1 mo10 16.0
## # ℹ 11,990 more rows
`
confidence_intervals_model <- boot_distribution_model |>
get_confidence_interval(
level = 0.95,
type = "percentile",
point_estimate = observed_fit)
confidence_intervals_model## # A tibble: 12 × 3
## term lower_ci upper_ci
## <chr> <dbl> <dbl>
## 1 intercept 44.7 45.2
## 2 mo10 15.4 16.1
## 3 mo11 5.29 5.99
## 4 mo12 -0.403 0.279
## 5 mo2 3.68 4.40
## 6 mo3 8.09 8.81
## 7 mo4 13.7 14.4
## 8 mo5 20.7 21.6
## 9 mo6 25.7 26.5
## 10 mo7 32.2 33.0
## 11 mo8 32.1 32.9
## 12 mo9 26.4 27.1
Repeat the display of the regression table from the theoretical model and compare.
## # A tibble: 12 × 3
## term lower_ci upper_ci
## <chr> <dbl> <dbl>
## 1 intercept 44.6 45.2
## 2 mo: 2 3.62 4.44
## 3 mo: 3 8.05 8.85
## 4 mo: 4 13.7 14.5
## 5 mo: 5 20.7 21.5
## 6 mo: 6 25.7 26.5
## 7 mo: 7 32.2 33.0
## 8 mo: 8 32.2 33.0
## 9 mo: 9 26.4 27.2
## 10 mo: 10 15.3 16.1
## 11 mo: 11 5.22 6.03
## 12 mo: 12 -0.463 0.337