---
title: "BUA 345 - HW 10"
author: "Penelope Pooler Eisenbies"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
editor: visual
---
## Setup
Run the following chunk of R code to install and load the packages needed for this assignment.
Click green triangle in upper right corner of the setup chunk to run the setup code.
Note that `setup` code will not appear in the rendered HTML file.
```{r include=F}
#|label: Setup
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, knitr, gridExtra,
forecast, tidyquant, lubridate, maps, usdata,
mapproj, ggthemes, RColorBrewer, dygraphs, kableExtra)
# verify packages
p_loaded()
```
## NOTES:
- Data used in HW 10 include vlaues through December of 2024.
- These datasets were last updated in the Spring of 2025 when the HW 10 demo videos were created.
- Forecast HW Demo videos are updated every other year.
The Alaska data used in class has been updated to December 2025 so the HW data differs slightly.
## Median Marriage Data
```{r}
#|label: Import Marriage Data
# import and examine data
med_mar <- read_csv("data/med_marriage.csv",
show_col_types = F) |>
mutate(Date = mdy(Date)) |>
glimpse(width=60)
```
### Marriage Age Time Series Plot
```{r echo=F}
#|label: Create dygraph
# convert to xts (extensible time series)
med_mar_xts <- xts(x = med_mar[,3:4],
order.by = med_mar$Date)
# create interactive plot
(mar_dg <- dygraph(med_mar_xts, main="Median Age of First Marriage") |>
dySeries("Males", label="Males", color= "blue") |>
dySeries("Females", label="Females", color= "darkmagenta") |>
dyAxis("x", label = "", drawGrid = FALSE) |>
dyShading(from="2020-3-12", to="2021-6-14", color = "lightgrey") |>
dyRangeSelector())
```
### Female Time Series
- Create time series using female data
- Specify `freq = 1` - one observation per year
- Specify `start = 1947` - first year in dataset
- Model data using `auto.arima` function
- Specify `ic = aic` - `aic` is the information criterion used to determine model.
- Specify `seasonality = F` - no seasonal (repeating) pattern in the data.
### Create Time Series (`ts`) and Model
```{r}
#|label: Create female time series
# create time series for forecast
female_ts <- ts(med_mar$Females, freq=1, start=1947)
# model data using auto.arima function
female_model <- auto.arima(female_ts, ic="aic", seasonal=F)
```
### Female Model Forecast
- Create forecasts (until 2040)
- `h = 16` indicates we want to forecast 16 years
- Most recent year in data is 2024
- 2040 - 2024 - 16
- Forecasts become less accurate the further into the future you specify.
```{r}
#|label: female model forecast and plot
# create forecasts (until 2040)
female_forecast <- forecast(female_model, h=16)
# plot forecasts with 2 prediction interval bounds shown
autoplot(female_forecast) +
labs(y = "Median Age of First Marriage (Females)") +
theme_classic()
```
### Female Model Fit
```{r}
#|label: numeric female model forecasts
# examine numerical forecasts
# and prediction intervals
female_forecast
```
```{r}
#|label: formatted table of female model forecasts
female_forecast |> kable() |> kable_styling(full_width = F)
```
```{r}
#|label: accuracy of female model
# fit statistics only
accuracy(female_forecast)
```
### Female Model Residuals
- Top Plot: No spikes should be too large
- ACF stands for auto-correlation function.
- Ideally all or most values are with dashed lines
- Histogram: Distribution of residuals should be approx. normal
```{r}
#|label: female forecast model residuals
checkresiduals(female_forecast)
```
## AK Elec. Revenue Data
- Data Source: [U.S Energy Information Adminsitration](https://www.eia.gov/electricity/data/browser/#/topic/6?agg=0,1&geo=000000000000g&endsec=8&freq=Q&start=200101&end=202104&ctype=linechart<ype=pin&rtype=s&pin=&rse=0&maptype=0)
- Import and Clean Raw AK Data
```{r}
#|label: Import AK data
# a little data mgmt (if you are interested)
ak_res <- read_csv("data/AK_Residential_Electricity_Revenue_2025.csv",
show_col_types = F,
skip=5,
col_names = c("quarter","Revenue")) |>
separate(quarter, c("Quarter", "Year")) |>
mutate(Date=yq(paste(Year, Quarter))) |>
select(Date, Year, Quarter, Revenue) |>
arrange(Date) |>
glimpse()
```
### AK Resid Time Series Plot
```{r}
#|label: Create AK dygraph
# convert to xts (extensible time series)
ak_res_xts <- xts(x = ak_res[,4],
order.by = ak_res$Date)
# ak-res time_series
(ak_dg <- dygraph(ak_res_xts, "Alaska Residential Electric Utility Revenue") |>
dySeries("Revenue", label="Revenue ($M)", color= "darkmagenta") |>
dyAxis("y", label = "", drawGrid = FALSE) |>
dyAxis("x", label = "", drawGrid = FALSE) |>
dyShading(from="2020-3-12", to="2021-6-14", color = "lightgrey") |>
dyRangeSelector())
```
### AK Resid. Rev. Time Series
- Creat time series using ak_res data
- Specify `freq = 4` - four observations per year
- Specify `start = c(2001,1)` - first year/Q in dataset
- Model data using `auto.arima` function
- Specify `ic = aic` - `aic` is the information criterion used to determine model.
- Specify `seasonality = T` - seasonal (repeating) pattern is present in these data.
- This chunk will create and save the model to be used below.
### Create Time Series (`ts`) and Model
```{r}
#|label: Create AK time series
# create time series for forecast
ak_res_ts <- ts(ak_res$Revenue,
freq=4,
start=c(2001, 1))
# model data using auto.arima function
ak_res_model <- auto.arima(ak_res_ts, ic="aic", seasonal=T)
```
### AK Revenue Model Forecast
- Create forecasts (until 2027)
- `h = 12` indicates we want to forecast 12 quarters
- Most recent year in data is 2024
- $2027-2024 = 3 \times 4Qtrs. = 12Qtrs.$
- Note that forecasts become less accurate the further into the future you specify.
```{r}
#|label: AK model forecast and plot
# create forecasts (until end of 2027)
ak_res_forecast <- forecast(ak_res_model, h=12)
# plot forecasts with 2 prediction interval bounds shown
autoplot(ak_res_forecast) +
labs(y = "AK Resid. Elec. Revenue ($ Mill)") +
theme_classic()
```
### AK Revenue Model Fit
```{r}
#|label: numeric AK model forecasts
# examine numerical forecasts
# and prediction intervals
# examine numerical forecasts
# and prediction intervals
ak_res_forecast
```
```{r}
#|label: formatted table of AK model forecasts
female_forecast |> kable() |> kable_styling(full_width = F)
```
```{r}
#|label: accuracy of AK model
# fit statistics only
accuracy(ak_res_forecast)
```
### AK Revenue Model Residuals
- Top Plot: No spikes should be too large
- ACF stands for auto-correlation function.
- Ideally all or most values are with dashed lines
- Histogram: Distribution of residuals should be approx. normal
```{r}
#|label: AK forecast model residuals
checkresiduals(ak_res_forecast)
```
## When you are done...
1. Save your changes to this file.(Ctrl + S or Cmd + S)
2. OPTIONAL: Click `Render` button to update html file with your changes.
3. Close R/RStudio on your laptop or close Posit Cloud Browser.