I Decided on using vehicle sales since they have been a hot button topic due to supply chain issues.
# Libraries/Data ----------------------------------------------------------
library('tidyverse')
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library('lubridate')
## Warning: package 'lubridate' was built under R version 4.1.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library('forecast')
## Warning: package 'forecast' was built under R version 4.1.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library('fable')
## Warning: package 'fable' was built under R version 4.1.3
## Loading required package: fabletools
## Warning: package 'fabletools' was built under R version 4.1.3
##
## Attaching package: 'fabletools'
## The following objects are masked from 'package:forecast':
##
## accuracy, forecast
vehicle <- read.csv("TOTALSA.csv")
view(vehicle)
colSums(is.na(vehicle))
## DATE TOTALSA
## 0 0
myts <- ts(vehicle$TOTALSA, frequency = 12, start = c(1976))
myts %>%
autoplot() +
labs(title = "US Vehicles Sales",
y = "Sales In Millions")
The plot shows a good amount of seasonality but the trend is quite strange. The effects of the financial crises and covid are very present.
# Automated ETS Model -----------------------------------------------------
auto <- ets(myts)
auto
## ETS(A,N,N)
##
## Call:
## ets(y = myts)
##
## Smoothing parameters:
## alpha = 0.5583
##
## Initial states:
## l = 13.0378
##
## sigma: 0.9631
##
## AIC AICc BIC
## 3462.017 3462.061 3474.969
auto %>%
forecast(h = 10) %>%
autoplot() +
labs(title = "US Vehicles Sales",
y = "Sales In Millions")
My first model used the ets function only. This automatically gives you the model with the lowest AIC so I was interested and finding out what it would be.
# Seasonal model ----------------------------------------------------------
auto_2 <- ets(myts, 'AAA')
auto_2
## ETS(A,A,A)
##
## Call:
## ets(y = myts, model = "AAA")
##
## Smoothing parameters:
## alpha = 0.5672
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 13.4024
## b = 0.0012
## s = 0.1744 -0.1086 -0.0964 0.174 0.1568 -0.0806
## -0.1185 -0.0303 -0.1339 -0.0465 0.1198 -0.0103
##
## sigma: 0.9668
##
## AIC AICc BIC
## 3480.028 3481.169 3553.419
auto_2 %>%
forecast(h = 10) %>%
autoplot() +
labs(title = "US Vehicles Sales",
y = "Sales In Millions")
My second model was a AAA model to incorporate more seasonality. I wanted to see if I could almos overfit the ets model to the data for beter accuracy but the ets function knows better than me.