All R code can be found at the end of the HTML file.

Average monthly temperature in the United States (\(^oC\)) was retrieved for the time period from January 2013 through December 2021. A plot of the data is below, and shows clear seasonality along with a slightly upward trend.

The dataset containing US temperatures was separated between into training (80%) and testing (20%) data to allow for ETS models to be trained on the training data and measured for accuracy on the testing data.

Using the training dataset, three separate models were constructed, and are named as follows:

Each model was used to forecast 22 months beyond the training data and compared to the observed values of the testing data. Graphs showing these comparisons are below. Visibly, the ETS_A_A_M and ETS_A_A_A models both forecast future months reasonably well with tight prediction intervals; the ETS_M_N_A model is visibly worse than the other two.

The below table served as a way to select the best model of the three tested. As the table shows, the ETS_A_A_A model has the highest likelihood, along with the lowest values across all relevant metrics (\(AIC\), \(AIC_c\), \(BIC\), \(MSE\), \(AMSE\), \(MAE\)). In this way, the model was chosen from the group tested as the most suitable for forecasting.

## # A tibble: 3 × 9
##   .model    sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS_A_A_M  4.14    -244.  521.  530.  563.  3.37  3.42 1.31 
## 2 ETS_A_A_A  1.35    -195.  425.  434.  467.  1.10  1.13 0.839
## 3 ETS_M_N_A  0.256   -312.  653.  660.  690.  2.96  2.81 0.269

There are a variety of explanations that could help explain the accuracy of the ETS_A_A_A. One that comes to mind is simply how temperatures fluctuate throughout the year; it is more likely for temperatures in Winter/Summer to be a specific degree distance from each other versus a consistent proportional amount. As temperatures trend upwards, a proportional seasonality (e.g., multiplicative) would result in a widening distance between Winter low temperatures and Summer high temperatures. The data displays a trend where there tend to be consistent distances between peaks and valleys. In this way, the additive model was most appropriate to fit the data. It’s likely the improvement was subtle in this case because the trend in temperatures is moving slowly enough to not have a major impact one way or the other.

R Code

knitr::opts_chunk$set(echo = TRUE)
local({
  hook_source <- knitr::knit_hooks$get('source')
  knitr::knit_hooks$set(source = function(x, options) {
    x <- x[!grepl('# SECRET!!$', x)]
    hook_source(x, options)
  })
})

library("feasts")
library("seasonal")
library("tsibble")
library("tsibbledata")
library("dplyr")
library("ggplot2")
library("forecast")
library("fable")
library("fpp3")



#ODS_path defined here; excluded from publication


#IMPORT FILE & CREATE tsibble

UStemp_celsius <- readODS::read_ods(paste0(ODS_path, 
                                           "W2D1 - US Avg Temp (C)_R.ods"))

UStemp_celsius <- data.frame(UStemp_celsius)

UStemp_tsibble <- UStemp_celsius %>%
                    mutate(YRMTH = yearmonth(YRMTH)) %>%
                    as_tsibble(index = YRMTH)

rm(UStemp_celsius)

#PLOT TIME SERIES DATA
UStemp_tsibble %>%
  autoplot(TEMP_C) +
  labs(x = "Month",
       y = "Temperature (C)",
       title = "Avg. Monthly Temperature (C)",
       subtitle = "United States, 1/2013-12/2021")


#CREATE TESTING AND TRAINING DATA (20/80 SPLIT, RESPECTIVELY)

training_data_cutoff <- round(length(rownames(UStemp_tsibble))*0.8, 0)
testing_data_start <- training_data_cutoff+1

UStemp_training <- UStemp_tsibble[1:training_data_cutoff,]
UStemp_testing <- UStemp_tsibble[testing_data_start:length(rownames(UStemp_tsibble)),]

UStemp_training <- UStemp_tsibble[1:training_data_cutoff,]
UStemp_testing <- UStemp_tsibble[testing_data_start:length(rownames(UStemp_tsibble)),]


#BUILD 3 MODELS
UStemp_ETS_model <-
  UStemp_training %>%
  model(ETS_A_A_M = ETS(TEMP_C ~ error("A") + trend("A") + season("M")),
        ETS_A_A_A = ETS(TEMP_C ~ error("A") + trend("A") + season("A")),
        ETS_M_N_A = ETS(TEMP_C ~ error("M") + trend("N") + season("A")))

#FORECAST 22 MONTHS
UStemp_ETS_forecast <-
  UStemp_ETS_model %>%
  forecast(h = length(rownames(UStemp_testing)))

#PLOT FORECAST RESULTS
UStemp_ETS_forecast %>%
  filter(.model == "ETS_A_A_M") %>%
  autoplot(UStemp_testing) + 
  labs(x = "Month",
       y = "Temperature (C)",
       title = "Avg. Monthly Temperature (C), ETS_A_A_M")

UStemp_ETS_forecast %>%
  filter(.model == "ETS_A_A_A") %>%
  autoplot(UStemp_testing) + 
  labs(x = "Month",
       y = "Temperature (C)",
       title = "Avg. Monthly Temperature (C), ETS_A_A_A")

UStemp_ETS_forecast %>%
  filter(.model == "ETS_M_N_A") %>%
  autoplot(UStemp_testing) + 
  labs(x = "Month",
       y = "Temperature (C)",
       title = "Avg. Monthly Temperature (C), ETS_M_N_A")


#MODEL METRICS
print(glance(UStemp_ETS_model))