BUA 345 - HW 10

Author

Penelope Pooler Eisenbies

Published

April 22, 2024

Setup

Run the following chunk of R code to install and load the packages needed for this assignment.

Click green triangle in upper right corner of the setup chunk to run the setup code.

Note that setup code will not appear in the rendered HTML file.

Median Marriage Data

Show the code
# import and examine data
med_mar <- read_csv("data/med_marriage.csv", 
                    show_col_types = F) |>
  mutate(Date = mdy(Date)) |>
  glimpse(width=60)
Rows: 77
Columns: 4
$ Date    <date> 1947-12-31, 1948-12-31, 1949-12-31, 1950-…
$ Year    <dbl> 1947, 1948, 1949, 1950, 1951, 1952, 1953, …
$ Males   <dbl> 23.7, 23.3, 22.7, 22.8, 22.9, 23.0, 22.8, …
$ Females <dbl> 20.5, 20.4, 20.3, 20.3, 20.4, 20.2, 20.2, …

Marriage Age Time Series Plot

Female Time Series

  • Create time series using female data

    • Specify freq = 1 - one observation per year

    • Specify start = 1947 - first year in dataset

  • Model data using auto.arima function

    • Specify ic = aic - aic is the information criterion used to determine model.

    • Specify seasonality = F - no seasonal (repeating) pattern in the data.

Create Time Series (ts) and Model

Show the code
# create time series for forecast
female_ts <- ts(med_mar$Females, freq=1, start=1947)

# model data using auto.arima function
female_model <- auto.arima(female_ts, ic="aic", seasonal=F)

Female Model Forecast

  • Create forecasts (until 2040)

    • h = 19 indicates we want to forecast 17 years

    • Most recent year in data is 2023

    • 2040 - 2023 - 17

  • Forecasts become less accurate the further into the future you specify.

Show the code
# create forecasts (until 2040)
female_forecast <- forecast(female_model, h=17)

# plot forecasts with 2 prediction interval bounds shown
autoplot(female_forecast) + 
  labs(y = "Median Age of First Marriage (Females)") +
  theme_classic()

Female Model Fit

Show the code
# examine numerical forecasts 
# and prediction intervals
female_forecast
     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2024       28.55906 28.34613 28.77199 28.23341 28.88471
2025       28.60651 28.31683 28.89619 28.16348 29.04954
2026       28.65396 28.27502 29.03290 28.07442 29.23350
2027       28.70141 28.22291 29.17991 27.96961 29.43321
2028       28.74886 28.16183 29.33588 27.85108 29.64663
2029       28.79631 28.09268 29.49993 27.72020 29.87241
2030       28.84376 28.01609 29.67142 27.57795 30.10956
2031       28.89120 27.93256 29.84985 27.42509 30.35732
2032       28.93865 27.84249 30.03482 27.26222 30.61509
2033       28.98610 27.74620 30.22600 27.08984 30.88237
2034       29.03355 27.64398 30.42313 26.90838 31.15872
2035       29.08100 27.53606 30.62594 26.71822 31.44379
2036       29.12845 27.42266 30.83424 26.51966 31.73724
2037       29.17590 27.30396 31.04784 26.31302 32.03878
2038       29.22335 27.18014 31.26656 26.09853 32.34817
2039       29.27080 27.05135 31.49025 25.87644 32.66516
2040       29.31825 26.91772 31.71877 25.64696 32.98953
Show the code
female_forecast |> kable()
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2024 28.55906 28.34613 28.77199 28.23341 28.88471
2025 28.60651 28.31683 28.89619 28.16348 29.04954
2026 28.65396 28.27502 29.03290 28.07442 29.23350
2027 28.70141 28.22291 29.17991 27.96961 29.43321
2028 28.74886 28.16183 29.33588 27.85108 29.64663
2029 28.79631 28.09268 29.49993 27.72020 29.87241
2030 28.84376 28.01609 29.67142 27.57795 30.10956
2031 28.89120 27.93256 29.84985 27.42509 30.35732
2032 28.93865 27.84249 30.03482 27.26222 30.61509
2033 28.98610 27.74620 30.22600 27.08984 30.88237
2034 29.03355 27.64398 30.42313 26.90838 31.15872
2035 29.08100 27.53606 30.62594 26.71822 31.44379
2036 29.12845 27.42266 30.83424 26.51966 31.73724
2037 29.17590 27.30396 31.04784 26.31302 32.03878
2038 29.22335 27.18014 31.26656 26.09853 32.34817
2039 29.27080 27.05135 31.49025 25.87644 32.66516
2040 29.31825 26.91772 31.71877 25.64696 32.98953
Show the code
# fit statistics only
accuracy(female_forecast)
                      ME      RMSE       MAE        MPE      MAPE      MASE
Training set 0.006943277 0.1617777 0.1195648 0.04090515 0.4981806 0.8336627
                    ACF1
Training set -0.01551647

Female Model Residuals

  • Top Plot: No spikes should be too large

  • ACF stands for auto-correlation function.

    • Ideally all or most values are with dashed lines
  • Histogram: Distribution of residuals should be approx. normal

Show the code
checkresiduals(female_forecast)


    Ljung-Box test

data:  Residuals from ARIMA(0,2,2)
Q* = 7.3792, df = 8, p-value = 0.4963

Model df: 2.   Total lags used: 10

AK Elec. Revenue Data

Show the code
# a little data mgmt (if you are interested)
ak_res <- read_csv("data/AK_Residential_Electricity_Revenue_Updated.csv", 
                   show_col_types = F,
                   skip=5, 
                   col_names = c("quarter","Revenue")) |>
  separate(quarter, c("Quarter", "Year")) |>
  mutate(Date=yq(paste(Year, Quarter))) |>
  select(Date, Year, Quarter, Revenue) |>
  arrange(Date) |>
  glimpse()
Rows: 92
Columns: 4
$ Date    <date> 2001-01-01, 2001-04-01, 2001-07-01, 2001-10-01, 2002-01-01, 2…
$ Year    <chr> "2001", "2001", "2001", "2001", "2002", "2002", "2002", "2002"…
$ Quarter <chr> "Q1", "Q2", "Q3", "Q4", "Q1", "Q2", "Q3", "Q4", "Q1", "Q2", "Q…
$ Revenue <dbl> 542.9275, 424.4111, 394.4869, 529.6425, 570.5655, 439.6120, 40…

AK Resid Time Series Plot

AK Resid. Rev. Time Series

  • Creat time series using ak_res data

    • Specify freq = 4 - four observations per year

    • Specify start = c(2001,1) - first year/Q in dataset

  • Model data using auto.arima function

    • Specify ic = aic - aic is the information criterion used to determine model.

    • Specify seasonality = T - seasonal (repeating) pattern is present in these data.

  • This chunk will create and save the model to be used below.

Create Time Series (ts) and Model

Show the code
# create time series for forecast
ak_res_ts <- ts(ak_res$Revenue, 
                freq=4, 
                start=c(2001, 1))

# model data using auto.arima function
ak_res_model <- auto.arima(ak_res_ts, ic="aic", seasonal=T)

AK Revenue Model Forecast

  • Create forecasts (until 2026)

    • h = 12 indicates we want to forecast 12 quarters

    • Most recent year in data is 2023

    • \(2026-2023=3\times4Qtrs. = 12Qtrs.\)

  • Note that forecasts become less accurate the further into the future you specify.

Show the code
# create forecasts (until end of 2024)
ak_res_forecast <- forecast(ak_res_model, h=12)

# plot forecasts with 2 prediction interval bounds shown
autoplot(ak_res_forecast) + 
  labs(y = "AK Resid. Elec. Revenue ($ Mill)") +
  theme_classic()

AK Revenue Model Fit

Show the code
# examine numerical forecasts 
# and prediction intervals

# examine numerical forecasts 
# and prediction intervals

ak_res_forecast
        Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2024 Q1       582.3095 563.2840 601.3350 553.2126 611.4064
2024 Q2       441.4524 419.8389 463.0659 408.3974 474.5074
2024 Q3       425.7673 402.8662 448.6684 390.7431 460.7916
2024 Q4       556.4087 532.9286 579.8889 520.4990 592.3185
2025 Q1       589.1742 560.8841 617.4643 545.9082 632.4402
2025 Q2       445.1166 415.4104 474.8228 399.6849 490.5483
2025 Q3       429.7660 399.3399 460.1921 383.2333 476.2988
2025 Q4       560.8353 530.0753 591.5952 513.7920 607.8786
2026 Q1       594.8464 561.8355 627.8574 544.3605 645.3323
2026 Q2       451.4903 417.7570 485.2236 399.8996 503.0809
2026 Q3       432.2769 398.1707 466.3831 380.1160 484.4378
2026 Q4       557.4329 523.1520 591.7138 505.0048 609.8609
Show the code
female_forecast |> kable()
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2024 28.55906 28.34613 28.77199 28.23341 28.88471
2025 28.60651 28.31683 28.89619 28.16348 29.04954
2026 28.65396 28.27502 29.03290 28.07442 29.23350
2027 28.70141 28.22291 29.17991 27.96961 29.43321
2028 28.74886 28.16183 29.33588 27.85108 29.64663
2029 28.79631 28.09268 29.49993 27.72020 29.87241
2030 28.84376 28.01609 29.67142 27.57795 30.10956
2031 28.89120 27.93256 29.84985 27.42509 30.35732
2032 28.93865 27.84249 30.03482 27.26222 30.61509
2033 28.98610 27.74620 30.22600 27.08984 30.88237
2034 29.03355 27.64398 30.42313 26.90838 31.15872
2035 29.08100 27.53606 30.62594 26.71822 31.44379
2036 29.12845 27.42266 30.83424 26.51966 31.73724
2037 29.17590 27.30396 31.04784 26.31302 32.03878
2038 29.22335 27.18014 31.26656 26.09853 32.34817
2039 29.27080 27.05135 31.49025 25.87644 32.66516
2040 29.31825 26.91772 31.71877 25.64696 32.98953
Show the code
# fit statistics only
accuracy(ak_res_forecast)
                     ME     RMSE      MAE         MPE     MAPE      MASE
Training set -0.2292964 14.01561 10.46793 -0.07894875 1.962027 0.7845902
                     ACF1
Training set -0.004452267

AK Revenue Model Residuals

  • Top Plot: No spikes should be too large

  • ACF stands for auto-correlation function.

    • Ideally all or most values are with dashed lines
  • Histogram: Distribution of residuals should be approx. normal

Show the code
checkresiduals(ak_res_forecast)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,0)(2,1,1)[4] with drift
Q* = 3.7459, df = 3, p-value = 0.2902

Model df: 5.   Total lags used: 8

When you are done…

  1. Save your changes to this file.(Ctrl + S or Cmd + S)
  2. OPTIONAL: Click Render button to update html file with your changes.
  3. Close R/RStudio on your laptop or close Posit Cloud Browser.